draft-ietf-p2psip-self-tuning-14.txt   draft-ietf-p2psip-self-tuning-15.txt 
P2PSIP Working Group J. Maenpaa P2PSIP Working Group J. Maenpaa
Internet-Draft G. Camarillo Internet-Draft G. Camarillo
Intended status: Standards Track Ericsson Intended status: Standards Track Ericsson
Expires: December 18, 2014 June 16, 2014 Expires: December 28, 2014 June 26, 2014
Self-tuning Distributed Hash Table (DHT) for REsource LOcation And Self-tuning Distributed Hash Table (DHT) for REsource LOcation And
Discovery (RELOAD) Discovery (RELOAD)
draft-ietf-p2psip-self-tuning-14.txt draft-ietf-p2psip-self-tuning-15.txt
Abstract Abstract
REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P) REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P)
signaling protocol that provides an overlay network service. Peers signaling protocol that provides an overlay network service. Peers
in a RELOAD overlay network collectively run an overlay algorithm to in a RELOAD overlay network collectively run an overlay algorithm to
organize the overlay, and to store and retrieve data. This document organize the overlay, and to store and retrieve data. This document
describes how the default topology plugin of RELOAD can be extended describes how the default topology plugin of RELOAD can be extended
to support self-tuning, that is, to adapt to changing operating to support self-tuning, that is, to adapt to changing operating
conditions such as churn and network size. conditions such as churn and network size.
skipping to change at page 1, line 37 skipping to change at page 1, line 37
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 18, 2014. This Internet-Draft will expire on December 28, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 13 skipping to change at page 2, line 13
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Introduction to Stabilization in DHTs . . . . . . . . . . . . 5 3. Introduction to Stabilization in DHTs . . . . . . . . . . . . 5
3.1. Reactive vs. Periodic Stabilization . . . . . . . . . . . 5 3.1. Reactive vs. Periodic Stabilization . . . . . . . . . . . 6
3.2. Configuring Periodic Stabilization . . . . . . . . . . . 6 3.2. Configuring Periodic Stabilization . . . . . . . . . . . 6
3.3. Adaptive Stabilization . . . . . . . . . . . . . . . . . 8 3.3. Adaptive Stabilization . . . . . . . . . . . . . . . . . 8
4. Introduction to Chord . . . . . . . . . . . . . . . . . . . . 8 4. Introduction to Chord . . . . . . . . . . . . . . . . . . . . 8
5. Extending Chord-reload to Support Self-tuning . . . . . . . . 9 5. Extending Chord-reload to Support Self-tuning . . . . . . . . 9
5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 10 5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 10
5.2. Neighbor Stabilization . . . . . . . . . . . . . . . . . 11 5.2. Neighbor Stabilization . . . . . . . . . . . . . . . . . 11
5.3. Finger Stabilization . . . . . . . . . . . . . . . . . . 11 5.3. Finger Stabilization . . . . . . . . . . . . . . . . . . 11
5.4. Adjusting Finger Table Size . . . . . . . . . . . . . . . 11 5.4. Adjusting Finger Table Size . . . . . . . . . . . . . . . 12
5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . 12 5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . 12
5.6. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 12 5.6. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 12
6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . 12 6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . 12
6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 12 6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 13
6.2. Determining Routing Table Size . . . . . . . . . . . . . 13 6.2. Determining Routing Table Size . . . . . . . . . . . . . 13
6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 13 6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 14
6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . 14 6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . 14
6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . 15 6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . 15
6.5. Estimate Sharing . . . . . . . . . . . . . . . . . . . . 15 6.5. Estimate Sharing . . . . . . . . . . . . . . . . . . . . 16
6.6. Calculating the Stabilization Interval . . . . . . . . . 17 6.6. Calculating the Stabilization Interval . . . . . . . . . 17
7. Overlay Configuration Document Extension . . . . . . . . . . 18 7. Overlay Configuration Document Extension . . . . . . . . . . 18
8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 8. Security Considerations . . . . . . . . . . . . . . . . . . . 18
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
9.1. Message Extensions . . . . . . . . . . . . . . . . . . . 19 9.1. Message Extensions . . . . . . . . . . . . . . . . . . . 19
9.2. A New IETF XML Registry . . . . . . . . . . . . . . . . . 19 9.2. New Overlay Algorithm Type . . . . . . . . . . . . . . . 19
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 9.3. A New IETF XML Registry . . . . . . . . . . . . . . . . . 19
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20
11.1. Normative References . . . . . . . . . . . . . . . . . . 19 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 20
11.1. Normative References . . . . . . . . . . . . . . . . . . 20
11.2. Informative References . . . . . . . . . . . . . . . . . 20 11.2. Informative References . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22
1. Introduction 1. Introduction
REsource LOcation And Discovery (RELOAD) [RFC6940] is a peer-to-peer REsource LOcation And Discovery (RELOAD) [RFC6940] is a peer-to-peer
signaling protocol that can be used to maintain an overlay network, signaling protocol that can be used to maintain an overlay network,
and to store data in and retrieve data from the overlay. For and to store data in and retrieve data from the overlay. For
interoperability reasons, RELOAD specifies one overlay algorithm, interoperability reasons, RELOAD specifies one overlay algorithm,
called chord-reload, that is mandatory to implement. This document called chord-reload, that is mandatory to implement. This document
skipping to change at page 3, line 24 skipping to change at page 3, line 26
successor set, size of the routing table, and rate of maintenance successor set, size of the routing table, and rate of maintenance
messages. The problem with this approach is that it is not possible messages. The problem with this approach is that it is not possible
to achieve a low failure rate and a low communication overhead by to achieve a low failure rate and a low communication overhead by
using fixed parameters. Instead, a better approach is to allow the using fixed parameters. Instead, a better approach is to allow the
system to take into account the evolution of network conditions and system to take into account the evolution of network conditions and
adapt to them. adapt to them.
This document extends the mandatory-to-implement chord-reload This document extends the mandatory-to-implement chord-reload
algorithm by making it self-tuning. The use of the self-tuning algorithm by making it self-tuning. The use of the self-tuning
feature is optional. However, when used, it needs to be supported by feature is optional. However, when used, it needs to be supported by
all peers in the RELOAD overlay network. Two main advantages of all peers in the RELOAD overlay network. The fact that a RELOAD
self-tuning are that users no longer need to tune every DHT parameter overlay uses the self-tuning feature is indicated in the RELOAD
correctly for a given operating environment and that the system overlay configuration document using the CHORD-SELF-TUNING algorithm
adapts to changing operating conditions. name specified in Section 9.2 in the topology-plugin element. Two
main advantages of self-tuning are that users no longer need to tune
every DHT parameter correctly for a given operating environment and
that the system adapts to changing operating conditions.
The remainder of this document is structured as follows: Section 2 The remainder of this document is structured as follows: Section 2
provides definitions of terms used in this document. Section 3 provides definitions of terms used in this document. Section 3
discusses alternative approaches to stabilization operations in DHTs, discusses alternative approaches to stabilization operations in DHTs,
including reactive stabilization, periodic stabilization, and including reactive stabilization, periodic stabilization, and
adaptive stabilization. Section 4 gives an introduction to the Chord adaptive stabilization. Section 4 gives an introduction to the Chord
DHT algorithm. Section 5 describes how this document extends the DHT algorithm. Section 5 describes how this document extends the
stabilization routine of the chord-reload algorithm. Section 6 stabilization routine of the chord-reload algorithm. Section 6
describes how the stabilization rate and routing table size are describes how the stabilization rate and routing table size are
calculated in an adaptive fashion. calculated in an adaptive fashion.
skipping to change at page 9, line 39 skipping to change at page 9, line 45
list, finger table, and predecessor list. list, finger table, and predecessor list.
5. Extending Chord-reload to Support Self-tuning 5. Extending Chord-reload to Support Self-tuning
This section describes how the mandatory-to-implement chord-reload This section describes how the mandatory-to-implement chord-reload
algorithm defined in RELOAD base [RFC6940] can be extended to support algorithm defined in RELOAD base [RFC6940] can be extended to support
self-tuning. self-tuning.
The chord-reload algorithm supports both reactive and periodic The chord-reload algorithm supports both reactive and periodic
recovery strategies. When the self-tuning mechanisms defined in this recovery strategies. When the self-tuning mechanisms defined in this
document are used, the periodic recovery strategy MUST be used. document are used, the periodic recovery strategy is used. Further,
Further, chord-reload specifies that at least three predecessors and chord-reload specifies that at least three predecessors and three
three successors need to be maintained. When the self-tuning successors need to be maintained. When the self-tuning mechanisms
mechanisms are used, the appropriate sizes of the successor list and are used, the appropriate sizes of the successor list and predecessor
predecessor list are determined in an adaptive fashion based on the list are determined in an adaptive fashion based on the estimated
estimated network size, as will be described in Section 6. network size, as will be described in Section 6.
As specified in RELOAD base, each peer MUST maintain a stabilization As specified in RELOAD base, each peer maintains a stabilization
timer. When the stabilization timer fires, the peer MUST restart the timer. When the stabilization timer fires, the peer restarts the
timer and carry out the overlay stabilization routine. Overlay timer and carries out the overlay stabilization routine. Overlay
stabilization has four components in chord-reload: stabilization has four components in chord-reload:
1. Update the neighbor table. We refer to this as neighbor 1. Update the neighbor table. We refer to this as neighbor
stabilization. stabilization.
2. Refreshing the finger table. We refer to this as finger 2. Refreshing the finger table. We refer to this as finger
stabilization. stabilization.
3. Adjusting finger table size. 3. Adjusting finger table size.
skipping to change at page 10, line 34 skipping to change at page 10, line 39
intervals can be determined in an adaptive fashion based on the intervals can be determined in an adaptive fashion based on the
operating conditions of the overlay. The subsections below describe operating conditions of the overlay. The subsections below describe
how this document extends the four components of stabilization. how this document extends the four components of stabilization.
5.1. Update Requests 5.1. Update Requests
As described in RELOAD base [RFC6940], the neighbor and finger As described in RELOAD base [RFC6940], the neighbor and finger
stabilization procedures are implemented using Update requests. stabilization procedures are implemented using Update requests.
RELOAD base defines three types of Update requests: 'peer_ready', RELOAD base defines three types of Update requests: 'peer_ready',
'neighbors', and 'full'. Regardless of the type, all Update requests 'neighbors', and 'full'. Regardless of the type, all Update requests
include an 'uptime' field. Since the self-tuning extensions require include an 'uptime' field. The self-tuning extensions require
information on the uptimes of peers in the routing table, the sender information on the uptimes of peers in the routing table. The sender
of an Update request MUST include its current uptime in seconds in of an Update request includes its current uptime (in seconds) in the
the 'uptime' field. 'uptime' field. Regardless of the type, all Update requests MUST
include an 'uptime' field.
When self-tuning is used, each peer decides independently the When self-tuning is used, each peer decides independently the
appropriate size for the successor list, predecessor list and finger appropriate size for the successor list, predecessor list and finger
table. Thus, the 'predecessors', 'successors', and 'fingers' fields table. Thus, the 'predecessors', 'successors', and 'fingers' fields
included in RELOAD Update requests are of variable length. As included in RELOAD Update requests are of variable length. As
specified in RELOAD [RFC6940], variable length fields are on the wire specified in RELOAD [RFC6940], variable length fields are on the wire
preceded by length bytes. In the case of the successor list, preceded by length bytes. In the case of the successor list,
predecessor list, and finger table, there are two length bytes predecessor list, and finger table, there are two length bytes
(allowing lengths up to 2^16-1). The number of NodeId structures (allowing lengths up to 2^16-1). The number of NodeId structures
included in each field can be calculated based on the length bytes included in each field can be calculated based on the length bytes
since the size of a single NodeId structure is 16 bytes. If a peer since the size of a single NodeId structure is 16 bytes. If a peer
receives more entries than fit into its successor list, predecessor receives more entries than fit into its successor list, predecessor
list or finger table, the peer MUST ignore the extra entries. If a list or finger table, the peer MUST ignore the extra entries. A peer
peer receives less entries than it currently has in its own data may also receive less entries than it currently has in its own data
structure, the peer MUST NOT drop the extra entries from its data structure. In that case, it uses the received entries to update only
structure. a subset of the entries in its data structure. As an example, a peer
that has a successor list of size 8 may receive a successor list of
size 4 from its immediate successor. In that case, the received
successor list can only be used to update the first few successors on
the peer's successor list. The rest of the successors will remain
intact.
5.2. Neighbor Stabilization 5.2. Neighbor Stabilization
In the neighbor stabilization operation of chord-reload, a peer In the neighbor stabilization operation of chord-reload, a peer
periodically sends an Update request to every peer in its Connection periodically sends an Update request to every peer in its Connection
Table. In a small, low-churn overlay, the amount of traffic this Table. In a small, low-churn overlay, the amount of traffic this
process generates is typically acceptable. However, in a large-scale process generates is typically acceptable. However, in a large-scale
overlay churning at a moderate or high churn rate, the traffic load overlay churning at a moderate or high churn rate, the traffic load
may no longer be acceptable since the size of the connection table is may no longer be acceptable since the size of the connection table is
large and the stabilization interval relatively short. The self- large and the stabilization interval relatively short. The self-
tuning mechanisms described in this document are especially designed tuning mechanisms described in this document are especially designed
for overlays of the latter type. Therefore, when the self-tuning for overlays of the latter type. Therefore, when the self-tuning
mechanisms are used, each peer MUST send a periodic Update request mechanisms are used, each peer only sends a periodic Update request
only to its first predecessor and first successor on the Chord ring. to its first predecessor and first successor on the Chord ring; it
MUST NOT send Update requests to others.
The neighbor stabilization routine MUST be executed when the The neighbor stabilization routine is executed when the stabilization
stabilization timer fires. To begin the neighbor stabilization timer fires. To begin the neighbor stabilization routine, a peer
routine, a peer MUST send an Update request to its first successor sends an Update request to its first successor and its first
and its first predecessor. The type of the Update request MUST be predecessor. The type of the Update request MUST be 'neighbors'.
'neighbors'. The Update request MUST include the successor and The Update request includes the successor and predecessor lists of
predecessor lists of the sender. If a peer receiving such an Update the sender. If a peer receiving such an Update request learns from
request learns from the predecessor and successor lists included in the predecessor and successor lists included in the request that new
the request that new peers can be included in its neighborhood set, peers can be included in its neighborhood set, it sends Attach
it MUST send Attach requests to the new peers. requests to the new peers.
After a new peer has been added to the predecessor or successor list, After a new peer has been added to the predecessor or successor list,
an Update request of type 'peer_ready' MUST be sent to the new peer. an Update request of type 'peer_ready' is sent to the new peer. This
This allows the new peer to insert the sender into its neighborhood allows the new peer to insert the sender into its neighborhood set.
set.
5.3. Finger Stabilization 5.3. Finger Stabilization
Chord-reload specifies two alternative methods for searching for new Chord-reload specifies two alternative methods for searching for new
peers to the finger table. Both of the alternatives can be used with peers to the finger table. Both of the alternatives can be used with
the self-tuning extensions defined in this document. the self-tuning extensions defined in this document.
Immediately after a new peer has been added to the finger table, a Immediately after a new peer has been added to the finger table, a
Probe request MUST be sent to the new peer to fetch its uptime. The Probe request is sent to the new peer to fetch its uptime. The
requested_info field of the Probe request MUST be set to contain the requested_info field of the Probe request MUST be set to contain the
ProbeInformationType 'uptime' defined in RELOAD base [RFC6940]. ProbeInformationType 'uptime' defined in RELOAD base [RFC6940].
5.4. Adjusting Finger Table Size 5.4. Adjusting Finger Table Size
The chord-reload algorithm defines how a peer can make sure that the The chord-reload algorithm defines how a peer can make sure that the
finger table is appropriately sized to allow for efficient routing. finger table is appropriately sized to allow for efficient routing.
Since the self-tuning mechanisms specified in this document produce a Since the self-tuning mechanisms specified in this document produce a
network size estimate, this estimate can be directly used to network size estimate, this estimate can be directly used to
calculate the optimal size for the finger table. This mechanism MUST calculate the optimal size for the finger table. This mechanism is
be used instead of the one specified by chord-reload. A peer MUST used instead of the one specified by chord-reload. A peer uses the
use the network size estimate to determine whether it needs to adjust network size estimate to determine whether it needs to adjust the
the size of its finger table each time when the stabilization timer size of its finger table each time when the stabilization timer
fires. The way this is done is explained in Section 6.2. fires. The way this is done is explained in Section 6.2.
5.5. Detecting Partitioning 5.5. Detecting Partitioning
This document does not require any changes to the mechanism chord- This document does not require any changes to the mechanism chord-
reload uses to detect network partitioning. reload uses to detect network partitioning.
5.6. Leaving the Overlay 5.6. Leaving the Overlay
As specified in RELOAD base [RFC6940], a leaving peer SHOULD send a As specified in RELOAD base [RFC6940], a leaving peer SHOULD send a
Leave request to all members of its neighbor table prior to leaving Leave request to all members of its neighbor table prior to leaving
the overlay. The overlay_specific_data field MUST contain the the overlay. The overlay_specific_data field MUST contain the
ChordLeaveData structure. The Leave requests that are sent to ChordLeaveData structure. The Leave requests that are sent to
successors MUST contain the predecessor list of the leaving peer. successors contain the predecessor list of the leaving peer. The
The Leave requests that are sent to the predecessors MUST contain the Leave requests that are sent to the predecessors contain the
successor list of the leaving peer. If a given successor can successor list of the leaving peer. If a given successor can
identify better predecessors than are already included in its identify better predecessors (that is, predecessors that are closer
predecessor lists by investigating the predecessor list it receives to it on the Chord ring than its existing predecessors) than are
from the leaving peer, it MUST send Attach requests to them. already included in its predecessor lists by investigating the
Similarly, if a given predecessor identifies better successors by predecessor list it receives from the leaving peer, it sends Attach
investigating the successor list it receives from the leaving peer, requests to them. Similarly, if a given predecessor identifies
it MUST send Attach requests to them. better successors by investigating the successor list it receives
from the leaving peer, it sends Attach requests to them.
6. Self-tuning Chord Parameters 6. Self-tuning Chord Parameters
This section specifies how to determine an appropriate stabilization This section specifies how to determine an appropriate stabilization
rate and routing table size in an adaptive fashion. The proposed rate and routing table size in an adaptive fashion. The proposed
mechanism is based on [mahajan2003], [liben-nowell2002], and mechanism is based on [mahajan2003], [liben-nowell2002], and
[ghinita2006]. To calculate an appropriate stabilization rate, the [ghinita2006]. To calculate an appropriate stabilization rate, the
values of three parameters must be estimated: overlay size N, failure values of three parameters must be estimated: overlay size N, failure
rate U, and join rate L. To calculate an appropriate routing table rate U, and join rate L. To calculate an appropriate routing table
size, the estimated network size N can be used. Peers in the overlay size, the estimated network size N can be used. Peers in the overlay
skipping to change at page 13, line 4 skipping to change at page 13, line 15
chord-reload algorithm at the end of each stabilization period before chord-reload algorithm at the end of each stabilization period before
re-starting the stabilization timer. re-starting the stabilization timer.
6.1. Estimating Overlay Size 6.1. Estimating Overlay Size
Techniques for estimating the size of an overlay network have been Techniques for estimating the size of an overlay network have been
proposed for instance in [mahajan2003], [horowitz2003], proposed for instance in [mahajan2003], [horowitz2003],
[kostoulas2005], [binzenhofer2006], and [ghinita2006]. In Chord, the [kostoulas2005], [binzenhofer2006], and [ghinita2006]. In Chord, the
density of peer identifiers in the neighborhood set can be used to density of peer identifiers in the neighborhood set can be used to
produce an estimate of the size of the overlay, N [mahajan2003]. produce an estimate of the size of the overlay, N [mahajan2003].
Since peer identifiers are picked randomly with uniform probability Since peer identifiers are picked randomly with uniform probability
from the numBitsInNodeId-bit identifier space, the average distance from the numBitsInNodeId-bit identifier space, the average distance
between peer identifiers in the successor set is between peer identifiers in the successor set is
(2^numBitsInNodeId)/N. (2^numBitsInNodeId)/N.
To estimate the overlay network size, a peer MUST compute the average To estimate the overlay network size, a peer computes the average
inter-peer distance d between the successive peers starting from the inter-peer distance d between the successive peers starting from the
most distant predecessor and ending to the most distant successor in most distant predecessor and ending to the most distant successor in
the successor list. The estimated network size MUST be calculated the successor list. The estimated network size is calculated as:
as:
2^numBitsInNodeId 2^numBitsInNodeId
N = ------------------- N = -------------------
d d
This estimate has been found to be accurate within 15% of the real This estimate has been found to be accurate within 15% of the real
network size [ghinita2006]. Of course, the size of the neighborhood network size [ghinita2006]. Of course, the size of the neighborhood
set affects the accuracy of the estimate. set affects the accuracy of the estimate.
During the join process, a joining peer fills its routing table by During the join process, a joining peer fills its routing table by
skipping to change at page 13, line 37 skipping to change at page 13, line 46
information at its disposal to calculate an estimate of the network information at its disposal to calculate an estimate of the network
size. size.
6.2. Determining Routing Table Size 6.2. Determining Routing Table Size
As specified in RELOAD base, the finger table must contain at least As specified in RELOAD base, the finger table must contain at least
16 entries. When the self-tuning mechanisms are used, the size of 16 entries. When the self-tuning mechanisms are used, the size of
the finger table MUST be set to max(ceiling(log2(N)), 16) using the the finger table MUST be set to max(ceiling(log2(N)), 16) using the
estimated network size N. estimated network size N.
The size of the successor list MUST be set to ceiling(log2(N)). An The size of the successor list MUST be set to a maximum of
implementation MAY place a lower limit on the size of the successor ceiling(log2(N)). An implementation can place a lower limit on the
list. As an example, the implementation might require the size of size of the successor list. As an example, the implementation might
the successor list to be always at least three. require the size of the successor list to be always at least three.
A peer MAY choose to maintain a fixed-size predecessor list with only The size of the predecessor list MUST be set to ceiling(log2(N)).
three entries as specified in RELOAD base. However, it is
RECOMMENDED that a peer maintains ceiling(log2(N)) predecessors.
6.3. Estimating Failure Rate 6.3. Estimating Failure Rate
A typical approach is to assume that peers join the overlay according A typical approach is to assume that peers join the overlay according
to a Poisson process with rate L and leave according to a Poisson to a Poisson process with rate L and leave according to a Poisson
process with rate parameter U [mahajan2003]. The value of U can be process with rate parameter U [mahajan2003]. The value of U can be
estimated using peer failures in the finger table and neighborhood estimated using peer failures in the finger table and neighborhood
set [mahajan2003]. If peers fail with rate U, a peer with M unique set [mahajan2003]. If peers fail with rate U, a peer with M unique
peer identifiers in its routing table should observe K failures in peer identifiers in its routing table should observe K failures in
time K/(M*U). Every peer in the overlay MUST maintain a history of time K/(M*U). Every peer in the overlay maintains a history of the
the last K failures. The current time MUST be inserted into the last K failures. The current time is inserted into the history when
history when the peer joins the overlay. The estimate of U MUST be the peer joins the overlay. The estimate of U is calculated as:
calculated as:
k k
U = --------, U = --------,
M * Tk M * Tk
where M is the number of unique peer identifiers in the routing where M is the number of unique peer identifiers in the routing
table, Tk is the time between the first and the last failure in the table, Tk is the time between the first and the last failure in the
history, and k is the number of failures in the history. If k is history, and k is the number of failures in the history. If k is
smaller than K, the estimate MUST be computed as if there was a smaller than K, the estimate is computed as if there was a failure at
failure at the current time. It has been shown that an estimate the current time. It has been shown that an estimate calculated in a
calculated in a similar manner is accurate within 17% of the real similar manner is accurate within 17% of the real value of U
value of U [ghinita2006]. [ghinita2006].
The size of the failure history K affects the accuracy of the The size of the failure history K affects the accuracy of the
estimate of U. One can increase the accuracy by increasing K. estimate of U. One can increase the accuracy by increasing K.
However, this has the side effect of decreasing responsiveness to However, this has the side effect of decreasing responsiveness to
changes in the failure rate. On the other hand, a small history size changes in the failure rate. On the other hand, a small history size
may cause a peer to overreact each time a new failure occurs. In may cause a peer to overreact each time a new failure occurs. In
[ghinita2006], K is set to 25% of the routing table size. Use of [ghinita2006], K is set to 25% of the routing table size. Use of
this value is RECOMMENDED. this value is RECOMMENDED.
6.3.1. Detecting Failures 6.3.1. Detecting Failures
A new failure MUST be inserted to the failure history in the A new failure is inserted to the failure history in the following
following cases: cases:
1. A Leave request is received from a neigbhor. 1. A Leave request is received from a neigbhor.
2. A peer fails to reply to a Ping request sent in the situation 2. A peer fails to reply to a Ping request sent in the situation
explained below. If no packets have been received on a explained below. If no packets have been received on a
connection during the past 2*Tr seconds (where Tr is the connection during the past 2*Tr seconds (where Tr is the
inactivity timer defined by ICE [RFC5245]), a RELOAD Ping request inactivity timer defined by ICE [RFC5245]), a RELOAD Ping request
MUST be sent to the remote peer. RELOAD mandates the use of STUN MUST be sent to the remote peer. RELOAD mandates the use of STUN
[RFC5389] for keepalives. STUN keepalives take the form of STUN [RFC5389] for keepalives. STUN keepalives take the form of STUN
Binding Indication transactions. As specified in ICE [RFC5245], Binding Indication transactions. As specified in ICE [RFC5245],
a peer sends a STUN Binding Indication if there has been no a peer sends a STUN Binding Indication if there has been no
packet sent on a connection for Tr seconds. Tr is configurable packet sent on a connection for Tr seconds. Tr is configurable
and has a default of 15 seconds. Although STUN Binding and has a default of 15 seconds. Although STUN Binding
Indications do not generate a response, the fact that a peer has Indications do not generate a response, the fact that a peer has
failed can be learned from the lack of packets (Binding failed can be learned from the lack of packets (Binding
Indications or application protocol packets) received from the Indications or application protocol packets) received from the
peer. If the remote peer fails to reply to the Ping request, the peer. If the remote peer fails to reply to the Ping request, the
sender MUST consider the remote peer to have failed. sender should consider the remote peer to have failed.
As an alternative to relying on STUN keepalives to detect peer As an alternative to relying on STUN keepalives to detect peer
failure, a peer could send additional, frequent RELOAD messages to failure, a peer could send additional, frequent RELOAD messages to
every peer in its Connection Table. These messages could be Update every peer in its Connection Table. These messages could be Update
requests, in which case they would serve two purposes: detecting peer requests, in which case they would serve two purposes: detecting peer
failure and stabilization. However, as the cost of this approach can failure and stabilization. However, as the cost of this approach can
be very high in terms of bandwidth consumption and traffic load, be very high in terms of bandwidth consumption and traffic load,
especially in large-scale overlays experiencing churn, its use is NOT especially in large-scale overlays experiencing churn, its use is NOT
RECOMMENDED. RECOMMENDED.
6.4. Estimating Join Rate 6.4. Estimating Join Rate
Reference [ghinita2006] proposes that a peer can estimate the join Reference [ghinita2006] proposes that a peer can estimate the join
rate based on the uptime of the peers in its routing table. An rate based on the uptime of the peers in its routing table. An
increase in peer join rate will be reflected by a decrease in the increase in peer join rate will be reflected by a decrease in the
average age of peers in the routing table. Thus, each peer MUST average age of peers in the routing table. Thus, each peer maintaind
maintain an array of the ages of the peers in its routing table an array of the ages of the peers in its routing table sorted in
sorted in increasing order. Using this information, an estimate of increasing order. Using this information, an estimate of the global
the global peer join rate L MUST be calculated as: peer join rate L is calculated as:
N N
L = ----------------------, L = ----------------------,
Ages[floor(rsize/2)] Ages[floor(rsize/2)]
where Ages is an array containing the ages of the peers in the where Ages is an array containing the ages of the peers in the
routing table sorted in increasing order and rsize is the size of the routing table sorted in increasing order and rsize is the size of the
routing table. It has been shown that the estimate obtained by using routing table. It has been shown that the estimate obtained by using
this method is accurate within 22% of the real join rate this method is accurate within 22% of the real join rate
[ghinita2006]. Of course, the size of the routing table affects the [ghinita2006]. Of course, the size of the routing table affects the
skipping to change at page 15, line 48 skipping to change at page 16, line 8
the stabilization operations since all Update requests carry uptime the stabilization operations since all Update requests carry uptime
values. A joining peer learns the uptime of the admitting peer since values. A joining peer learns the uptime of the admitting peer since
it receives an Update from the admitting peer during the join it receives an Update from the admitting peer during the join
procedure. Peers learn the uptimes of new fingers since they can procedure. Peers learn the uptimes of new fingers since they can
fetch the uptime using a Probe request after having attached to the fetch the uptime using a Probe request after having attached to the
new finger. new finger.
6.5. Estimate Sharing 6.5. Estimate Sharing
To improve the accuracy of network size, join rate, and leave rate To improve the accuracy of network size, join rate, and leave rate
estimates, peers MUST share their estimates. When the stabilization estimates, peers share their estimates. When the stabilization timer
timer fires, a peer MUST select number-of-peers-to-probe random peers fires, a peer selects number-of-peers-to-probe random peers from its
from its finger table and send each of them a Probe request. The finger table and send each of them a Probe request. The targets of
targets of Probe requests are selected from the finger table rather Probe requests are selected from the finger table rather than from
than from the neighbor table since neighbors are likely to make the neighbor table since neighbors are likely to make similar errors
similar errors when calculating their estimates. number-of-peers-to- when calculating their estimates. number-of-peers-to-probe is a new
probe is a new element in the overlay configuration document. It is element in the overlay configuration document. It is defined in
defined in Section 7. Both the Probe request and the answer returned Section 7. Both the Probe request and the answer returned by the
by the target peer MUST contain a new message extension whose target peer MUST contain a new message extension whose
MessageExtensionType is 'self_tuning_data'. This extension type is MessageExtensionType is 'self_tuning_data'. This extension type is
defined in Section 9.1. The extension_contents field of the defined in Section 9.1. The extension_contents field of the
MessageExtension structure MUST contain a SelfTuningData structure: MessageExtension structure MUST contain a SelfTuningData structure:
struct { struct {
uint32 network_size; uint32 network_size;
uint32 join_rate; uint32 join_rate;
uint32 leave_rate; uint32 leave_rate;
} SelfTuningData; } SelfTuningData;
skipping to change at page 16, line 42 skipping to change at page 17, line 9
The join and leave rates are expressed as joins or failures per 24 The join and leave rates are expressed as joins or failures per 24
hours. As an example, if the global join rate estimate a peer has hours. As an example, if the global join rate estimate a peer has
calculated is 0.123 peers/s, it would include in the join_rate field calculated is 0.123 peers/s, it would include in the join_rate field
the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2), that is, the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2), that is,
the value 10628. the value 10628.
The 'type' field of the MessageExtension structure MUST be set to The 'type' field of the MessageExtension structure MUST be set to
contain the value 'self_tuning_data'. The 'critical' field of the contain the value 'self_tuning_data'. The 'critical' field of the
structure MUST be set to False. structure MUST be set to False.
A peer MUST store all estimates it receives in Probe requests and A peer stores all estimates it receives in Probe requests and answers
answers during a stabilization interval. When the stabilization during a stabilization interval. When the stabilization timer fires,
timer fires, the peer MUST calculate the estimates to be used during the peer calculates the estimates to be used during the next
the next stabilization interval by taking the 75th percentile (i.e., stabilization interval by taking the 75th percentile (i.e., third
third quartile) of a data set containing its own estimate and the quartile) of a data set containing its own estimate and the received
received estimates. estimates.
The default value for number-of-peers-to-probe is 4. This default The default value for number-of-peers-to-probe is 4. This default
value is recommended to allow a peer to receive a sufficiently large value is recommended to allow a peer to receive a sufficiently large
set of estimates from other peers. With a value of 4, a peer set of estimates from other peers. With a value of 4, a peer
receives four estimates in Probe answers. On the average, each peer receives four estimates in Probe answers. On the average, each peer
also receives four Probe requests each carrying an estimate. Thus, also receives four Probe requests each carrying an estimate. Thus,
on the average, each peer has nine estimates (including its own) that on the average, each peer has nine estimates (including its own) that
it can use at the end of the stablization interval. A value smaller it can use at the end of the stablization interval. A value smaller
than 4 is NOT RECOMMENDED to keep the number of received estimates than 4 is NOT RECOMMENDED to keep the number of received estimates
high enough. As an example, if the value were 2, there would be high enough. As an example, if the value were 2, there would be
skipping to change at page 17, line 35 skipping to change at page 17, line 43
According to [liben-nowell2002], a Chord network in a ring-like state According to [liben-nowell2002], a Chord network in a ring-like state
remains in a ring-like state as long as peers send remains in a ring-like state as long as peers send
Omega(square(log(N))) messages before N new peers join or N/2 peers Omega(square(log(N))) messages before N new peers join or N/2 peers
fail. We can use the estimate of peer failure rate, U, to calculate fail. We can use the estimate of peer failure rate, U, to calculate
the time Tf in which N/2 peers fail: the time Tf in which N/2 peers fail:
1 1
Tf = ------ Tf = ------
2*U 2*U
Based on this estimate, a stabilization interval Tstab-1 MUST be Based on this estimate, a stabilization interval Tstab-1 is
calculated as: calculated as:
Tf Tf
Tstab-1 = ----------------- Tstab-1 = -----------------
square(log2(N)) square(log2(N))
On the other hand, the estimated join rate L can be used to calculate On the other hand, the estimated join rate L can be used to calculate
the time in which N new peers join the overlay. Based on the the time in which N new peers join the overlay. Based on the
estimate of L, a stabilization interval Tstab-2 MUST be calculated estimate of L, a stabilization interval Tstab-2 is calculated as:
as:
N N
Tstab-2 = --------------------- Tstab-2 = ---------------------
L * square(log2(N)) L * square(log2(N))
Finally, the actual stabilization interval Tstab that MUST be used Finally, the actual stabilization interval Tstab that is used can be
can be obtained by taking the minimum of Tstab-1 and Tstab-2. obtained by taking the minimum of Tstab-1 and Tstab-2.
The results obtained in [maenpaa2009] indicate that making the The results obtained in [maenpaa2009] indicate that making the
stabilization interval too small has the effect of making the overlay stabilization interval too small has the effect of making the overlay
less stable (e.g., in terms of detected loops and path failures). less stable (e.g., in terms of detected loops and path failures).
Thus, a lower limit should be used for the stabilization period. Thus, a lower limit should be used for the stabilization period.
Based on the results in [maenpaa2009], a lower limit of 15s is Based on the results in [maenpaa2009], a lower limit of 15s is
RECOMMENDED, since using a stabilization period smaller than this RECOMMENDED, since using a stabilization period smaller than this
will with a high probability cause too much traffic in the overlay. will with a high probability cause too much traffic in the overlay.
7. Overlay Configuration Document Extension 7. Overlay Configuration Document Extension
skipping to change at page 19, line 26 skipping to change at page 19, line 30
| Extension Name | Code | Specification | | Extension Name | Code | Specification |
+------------------+-------+---------------+ +------------------+-------+---------------+
| self_tuning_data | 0x3 | RFC-AAAA | | self_tuning_data | 0x3 | RFC-AAAA |
+------------------+-------+---------------+ +------------------+-------+---------------+
The contents of the extension are defined in Section 6.5. The contents of the extension are defined in Section 6.5.
Note to RFC Editor: please replace AAAA with the RFC number for this Note to RFC Editor: please replace AAAA with the RFC number for this
specification. specification.
9.2. A New IETF XML Registry 9.2. New Overlay Algorithm Type
This document introduces one additional overlay algorithm type to the
"RELOAD Overlay Algorithm Types" Registry:
+-------------------+-----------+
| Algorithm Name | Reference |
+-------------------+-----------+
| CHORD-SELF-TUNING | RFC-AAAA |
+-------------------+-----------+
Note to RFC Editor: please replace AAAA with the RFC number for this
specification.
9.3. A New IETF XML Registry
This document registers one new URI for the self-tuning namespace in This document registers one new URI for the self-tuning namespace in
the "ns" subregistry of the IETF XML registry defined in [RFC3688]. the "ns" subregistry of the IETF XML registry defined in [RFC3688].
URI: urn:ietf:params:xml:ns:p2p:self-tuning URI: urn:ietf:params:xml:ns:p2p:self-tuning
Registrant Contact: The IESG Registrant Contact: The IESG
XML: N/A, the requested URI is an XML namespace XML: N/A, the requested URI is an XML namespace
10. Acknowledgments 10. Acknowledgments
The authors would like to thank Jani Hautakorpi for his contributions The authors would like to thank Jani Hautakorpi for his contributions
to the document. The authors would also like to thank Carlos to the document. The authors would also like to thank Carlos
Bernardos, Martin Durst, Alissa Cooper, Tobias Gondrom, and Barry Bernardos, Martin Durst, Alissa Cooper, Tobias Gondrom, and Barry
Leiba for their comments on the document. Leiba for their comments on the document.
 End of changes. 38 change blocks. 
108 lines changed or deleted 126 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/