draft-ietf-p2psip-self-tuning-10.txt   draft-ietf-p2psip-self-tuning-11.txt 
P2PSIP Working Group J. Maenpaa P2PSIP Working Group J. Maenpaa
Internet-Draft G. Camarillo Internet-Draft G. Camarillo
Intended status: Standards Track Ericsson Intended status: Standards Track Ericsson
Expires: August 6, 2014 February 2, 2014 Expires: November 10, 2014 May 9, 2014
Self-tuning Distributed Hash Table (DHT) for REsource LOcation And Self-tuning Distributed Hash Table (DHT) for REsource LOcation And
Discovery (RELOAD) Discovery (RELOAD)
draft-ietf-p2psip-self-tuning-10.txt draft-ietf-p2psip-self-tuning-11.txt
Abstract Abstract
REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P) REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P)
signaling protocol that provides an overlay network service. Peers signaling protocol that provides an overlay network service. Peers
in a RELOAD overlay network collectively run an overlay algorithm to in a RELOAD overlay network collectively run an overlay algorithm to
organize the overlay, and to store and retrieve data. This document organize the overlay, and to store and retrieve data. This document
describes how the default topology plugin of RELOAD can be extended describes how the default topology plugin of RELOAD can be extended
to support self-tuning, that is, to adapt to changing operating to support self-tuning, that is, to adapt to changing operating
conditions such as churn and network size. conditions such as churn and network size.
skipping to change at page 1, line 37 skipping to change at page 1, line 37
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 6, 2014. This Internet-Draft will expire on November 10, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 22 skipping to change at page 2, line 22
3. Introduction to Stabilization in DHTs . . . . . . . . . . . . 5 3. Introduction to Stabilization in DHTs . . . . . . . . . . . . 5
3.1. Reactive vs. Periodic Stabilization . . . . . . . . . . . 5 3.1. Reactive vs. Periodic Stabilization . . . . . . . . . . . 5
3.2. Configuring Periodic Stabilization . . . . . . . . . . . 6 3.2. Configuring Periodic Stabilization . . . . . . . . . . . 6
3.3. Adaptive Stabilization . . . . . . . . . . . . . . . . . 7 3.3. Adaptive Stabilization . . . . . . . . . . . . . . . . . 7
4. Introduction to Chord . . . . . . . . . . . . . . . . . . . . 8 4. Introduction to Chord . . . . . . . . . . . . . . . . . . . . 8
5. Extending Chord-reload to Support Self-tuning . . . . . . . . 9 5. Extending Chord-reload to Support Self-tuning . . . . . . . . 9
5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 10 5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 10
5.2. Neighbor Stabilization . . . . . . . . . . . . . . . . . 10 5.2. Neighbor Stabilization . . . . . . . . . . . . . . . . . 10
5.3. Finger Stabilization . . . . . . . . . . . . . . . . . . 11 5.3. Finger Stabilization . . . . . . . . . . . . . . . . . . 11
5.4. Adjusting Finger Table Size . . . . . . . . . . . . . . . 11 5.4. Adjusting Finger Table Size . . . . . . . . . . . . . . . 11
5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . 11 5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . 12
5.6. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 12 5.6. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 12
6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . 12 6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . 12
6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 12 6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 12
6.2. Determining Routing Table Size . . . . . . . . . . . . . 13 6.2. Determining Routing Table Size . . . . . . . . . . . . . 13
6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 13 6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 13
6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . 14 6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . 14
6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . 15 6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . 15
6.5. Estimate Sharing . . . . . . . . . . . . . . . . . . . . 15 6.5. Estimate Sharing . . . . . . . . . . . . . . . . . . . . 15
6.6. Calculating the Stabilization Interval . . . . . . . . . 16 6.6. Calculating the Stabilization Interval . . . . . . . . . 17
7. Overlay Configuration Document Extension . . . . . . . . . . 17 7. Overlay Configuration Document Extension . . . . . . . . . . 17
8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 8. Security Considerations . . . . . . . . . . . . . . . . . . . 18
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
9.1. Message Extensions . . . . . . . . . . . . . . . . . . . 18 9.1. Message Extensions . . . . . . . . . . . . . . . . . . . 18
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19
11.1. Normative References . . . . . . . . . . . . . . . . . . 19 11.1. Normative References . . . . . . . . . . . . . . . . . . 19
11.2. Informative References . . . . . . . . . . . . . . . . . 19 11.2. Informative References . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21
1. Introduction 1. Introduction
REsource LOcation And Discovery (RELOAD) [I-D.ietf-p2psip-base] is a REsource LOcation And Discovery (RELOAD) [RFC6940] is a peer-to-peer
peer-to-peer signaling protocol that can be used to maintain an signaling protocol that can be used to maintain an overlay network,
overlay network, and to store data in and retrieve data from the and to store data in and retrieve data from the overlay. For
overlay. For interoperability reasons, RELOAD specifies one overlay interoperability reasons, RELOAD specifies one overlay algorithm,
algorithm, called chord-reload, that is mandatory to implement. This called chord-reload, that is mandatory to implement. This document
document extends the chord-reload algorithm by introducing self- extends the chord-reload algorithm by introducing self-tuning
tuning behavior. behavior.
Distributed Hash Table (DHT) based overlay networks are self- Distributed Hash Table (DHT) based overlay networks are self-
organizing, scalable and reliable. However, these features come at a organizing, scalable and reliable. However, these features come at a
cost: peers in the overlay network need to consume network bandwidth cost: peers in the overlay network need to consume network bandwidth
to maintain routing state. Most DHTs use a periodic stabilization to maintain routing state. Most DHTs use a periodic stabilization
routine to counter the undesirable effects of churn on routing. To routine to counter the undesirable effects of churn on routing. To
configure the parameters of a DHT, some characteristics such as churn configure the parameters of a DHT, some characteristics such as churn
rate and network size need to be known in advance. These rate and network size need to be known in advance. These
characteristics are then used to configure the DHT in a static characteristics are then used to configure the DHT in a static
fashion by using fixed values for parameters such as the size of the fashion by using fixed values for parameters such as the size of the
skipping to change at page 4, line 28 skipping to change at page 4, line 28
2^(numBitsInNodeId-i) on the Chord ring. This peer is called the 2^(numBitsInNodeId-i) on the Chord ring. This peer is called the
ith finger of peer n. As an example, the first entry in the finger ith finger of peer n. As an example, the first entry in the finger
table of peer n contains a peer half-way around the Chord ring table of peer n contains a peer half-way around the Chord ring
from peer n. The purpose of the finger table is to accelerate from peer n. The purpose of the finger table is to accelerate
lookups. lookups.
n.id: An abbreviation that is in this document used refer to the n.id: An abbreviation that is in this document used refer to the
Node-ID of peer n. Node-ID of peer n.
O(g(n)): Informally, saying that some equation f(n) = O(g(n)) means O(g(n)): Informally, saying that some equation f(n) = O(g(n)) means
that f(n) is less than some constant multiple of g(n). that f(n) is less than some constant multiple of g(n). For the
formal definition, please refer to [weiss1998].
Omega(g(n)): Informally, saying that some equation f(n) = Omega(g(n)): Informally, saying that some equation f(n) =
Omega(g(n)) means that f(n) is more than some constant multiple of Omega(g(n)) means that f(n) is more than some constant multiple of
g(n). g(n). For the formal definition, please refer to [weiss1998]
Predecessor List: A data structure containing the first predecessors Percentile: The Pth (0<=P<=100) percentile of N values arranged in
of a peer on the Chord ring. ascending order is obtained by first calculating the (ordinal)
rank n=(P/100)*N, rounding the result to the nearest integer, and
then taking the value corresponding to that rank.
Predecessor List: A data structure containing the first r
predecessors of a peer on the Chord ring.
Successor List: A data structure containing the first r successors Successor List: A data structure containing the first r successors
of a peer on the Chord ring. of a peer on the Chord ring.
Neighborhood Set: A term used to refer to the set of peers included Neighborhood Set: A term used to refer to the set of peers included
in the successor and predecessor lists of a given peer. in the successor and predecessor lists of a given peer.
Routing Table: Contents of a given peer's routing table include the Routing Table: Contents of a given peer's routing table include the
set of peers that the peer can use to route overlay messages. The set of peers that the peer can use to route overlay messages. The
routing table is made up of the finger table, successor list and routing table is made up of the finger table, successor list and
skipping to change at page 5, line 36 skipping to change at page 5, line 39
3.1. Reactive vs. Periodic Stabilization 3.1. Reactive vs. Periodic Stabilization
In reactive stabilization, a peer reacts to the loss of a peer in its In reactive stabilization, a peer reacts to the loss of a peer in its
neighborhood set or to the appearance of a new peer that should be neighborhood set or to the appearance of a new peer that should be
added to its neighborhood set by sending a copy of its neighbor table added to its neighborhood set by sending a copy of its neighbor table
to all peers in the neighborhood set. Periodic recovery, in to all peers in the neighborhood set. Periodic recovery, in
contrast, takes place independently of changes in the neighborhood contrast, takes place independently of changes in the neighborhood
set. In periodic recovery, a peer periodically shares its set. In periodic recovery, a peer periodically shares its
neighborhood set with each or a subset of the members of that set. neighborhood set with each or a subset of the members of that set.
The chord-reload algorithm [I-D.ietf-p2psip-base] supports both The chord-reload algorithm [RFC6940] supports both reactive and
reactive and periodic stabilization. It has been shown in [rhea2004] periodic stabilization. It has been shown in [rhea2004] that
that reactive stabilization works well for small neighborhood sets reactive stabilization works well for small neighborhood sets (i.e.,
(i.e., small overlays) and moderate churn. However, in large-scale small overlays) and moderate churn. However, in large-scale (e.g.,
(e.g., 1000 peers or more [rhea2004]) or high-churn overlays, 1000 peers or more [rhea2004]) or high-churn overlays, reactive
reactive stabilization runs the risk of creating a positive feedback stabilization runs the risk of creating a positive feedback cycle,
cycle, which can eventually result in congestion collapse. In which can eventually result in congestion collapse. In [rhea2004],
[rhea2004], it is shown that a 1000-peer overlay under churn uses it is shown that a 1000-peer overlay under churn uses significantly
significantly less bandwidth and has lower latencies when periodic less bandwidth and has lower latencies when periodic stabilization is
stabilization is used than when reactive stabilization is used. used than when reactive stabilization is used. Although in the
Although in the experiments carried out in [rhea2004], reactive experiments carried out in [rhea2004], reactive stabilization
stabilization performed well when there was no churn, its bandwidth performed well when there was no churn, its bandwidth use was
use was observed to jump dramatically under churn. At higher churn observed to jump dramatically under churn. At higher churn rates and
rates and larger scale overlays, periodic stabilization uses less larger scale overlays, periodic stabilization uses less bandwidth and
bandwidth and the resulting lower contention for the network leads to the resulting lower contention for the network leads to lower
lower latencies. For this reason, most DHTs such as CAN [CAN], Chord latencies. For this reason, most DHTs such as CAN [CAN], Chord
[Chord], Pastry [Pastry], Bamboo [rhea2004], etc. use periodic [Chord], Pastry [Pastry], Bamboo [rhea2004], etc. use periodic
stabilization [ghinita2006]. As an example, the first version of stabilization [ghinita2006]. As an example, the first version of
Bamboo used reactive stabilization, which caused Bamboo to suffer Bamboo used reactive stabilization, which caused Bamboo to suffer
from degradation in performance under churn. To fix this problem, from degradation in performance under churn. To fix this problem,
Bamboo was modified to use periodic stabilization. Bamboo was modified to use periodic stabilization.
In Chord, periodic stabilization is typically done both for In Chord, periodic stabilization is typically done both for
successors and fingers. An alternative strategy is analyzed in successors and fingers. An alternative strategy is analyzed in
[krishnamurthy2008]. In this strategy, called the correction-on- [krishnamurthy2008]. In this strategy, called the correction-on-
change maintenance strategy, a peer periodically stabilizes its change maintenance strategy, a peer periodically stabilizes its
skipping to change at page 7, line 16 skipping to change at page 7, line 23
evolve with time, it is not possible to achieve both a low lookup evolve with time, it is not possible to achieve both a low lookup
failure rate and a low communication overhead by using fixed failure rate and a low communication overhead by using fixed
parameters [ghinita2006]. parameters [ghinita2006].
As an example, to configure the Chord DHT algorithm, one needs to As an example, to configure the Chord DHT algorithm, one needs to
select values for the following parameters: size of successor list, select values for the following parameters: size of successor list,
stabilization interval, and size of the finger table. To select an stabilization interval, and size of the finger table. To select an
appropriate value for the stabilization interval, one needs to know appropriate value for the stabilization interval, one needs to know
the expected churn rate and overlay size. According to the expected churn rate and overlay size. According to
[liben-nowell2002], a Chord network in a ring-like state remains in a [liben-nowell2002], a Chord network in a ring-like state remains in a
ring-like state as long as peers send Omega(log2^2(N)) messages ring-like state as long as peers send Omega(square(log(N))) messages
before N new peers join or N/2 peers fail. Thus, in a 500-peer before N new peers join or N/2 peers fail. Thus, in a 500-peer
overlay churning at a rate such that one peer joins and one peer overlay churning at a rate such that one peer joins and one peer
leaves the network every 30 seconds, an appropriate stabilization leaves the network every 30 seconds, an appropriate stabilization
interval would be on the order of 93s. According to [Chord], the size interval would be on the order of 93s. According to [Chord], the size
of the successor list and finger table should be on the order of of the successor list and finger table should be on the order of
log2(N). Having a successor list of size O(log2(N)) makes it log(N). Already a successor list of a modest size (e.g., log2(N) or
unlikely that a peer will lose all of its successors, which would 2*log2(N), which is the successor list size used in [Chord]) makes it
cause the Chord ring to become disconnected. Thus, in a 500-peer very unlikely that a peer will lose all of its successors, which
network each peer should maintain on the order of nine successors and would cause the Chord ring to become disconnected. Thus, in a
fingers. However, if the churn rate doubles and the network size 500-peer network each peer should maintain on the order of nine
remains unchanged, the stabilization rate should double as well. successors and fingers. However, if the churn rate doubles and the
That is, the appropriate maintenance interval would now be on the network size remains unchanged, the stabilization rate should double
order of 46s. On the other hand, if the churn rate becomes e.g. six- as well. That is, the appropriate maintenance interval would now be
fold and the size of the network grows to 2000 peers, on the order of on the order of 46s. On the other hand, if the churn rate becomes
eleven fingers and successors should be maintained and the e.g. six-fold and the size of the network grows to 2000 peers, on the
order of eleven fingers and successors should be maintained and the
stabilization interval should be on the order of 42s. If one stabilization interval should be on the order of 42s. If one
continued using the old values, this could result in inaccurate continued using the old values, this could result in inaccurate
routing tables, network partitioning, and deteriorating performance. routing tables, network partitioning, and deteriorating performance.
3.3. Adaptive Stabilization 3.3. Adaptive Stabilization
A self-tuning DHT takes into consideration the continuous evolution A self-tuning DHT takes into consideration the continuous evolution
of network conditions and adapts to them. In a self-tuning DHT, each of network conditions and adapts to them. In a self-tuning DHT, each
peer collects statistical data about the network and dynamically peer collects statistical data about the network and dynamically
adjusts its stabilization rate, neighborhood set size, and finger adjusts its stabilization rate, neighborhood set size, and finger
skipping to change at page 8, line 43 skipping to change at page 8, line 46
table of peer n contains the peer half-way around the ring from peer table of peer n contains the peer half-way around the ring from peer
n. The second entry contains the peer that is 1/4th of the way n. The second entry contains the peer that is 1/4th of the way
around, the third entry the peer that is 1/8th of the way around, around, the third entry the peer that is 1/8th of the way around,
etc. In other words, the ith entry in the finger table at peer n etc. In other words, the ith entry in the finger table at peer n
contains the identity of the first peer s that succeeds n by at least contains the identity of the first peer s that succeeds n by at least
2^(numBitsInNodeId-i) on the Chord ring. This peer is called the ith 2^(numBitsInNodeId-i) on the Chord ring. This peer is called the ith
finger of peer n. The interval between two consecutive fingers is finger of peer n. The interval between two consecutive fingers is
called a finger interval. The ith finger interval of peer n covers called a finger interval. The ith finger interval of peer n covers
the range [n.id + 2^(numBitsInNodeId-i), n.id + the range [n.id + 2^(numBitsInNodeId-i), n.id +
2^(numBitsInNodeId-i+1)) on the Chord ring. In an N-peer network, 2^(numBitsInNodeId-i+1)) on the Chord ring. In an N-peer network,
each peer maintains information about O(log2(N)) other peers in its each peer maintains information about O(log(N)) other peers in its
finger table. As an example, if N=100000, it is sufficient to finger table. As an example, if N=100000, it is sufficient to
maintain 17 fingers. maintain 17 fingers.
Chord needs all peers' successor pointers to be up to date in order Chord needs all peers' successor pointers to be up to date in order
to ensure that lookups produce correct results as the set of to ensure that lookups produce correct results as the set of
participating peers changes. To achieve this, peers run a participating peers changes. To achieve this, peers run a
stabilization protocol periodically in the background. The stabilization protocol periodically in the background. The
stabilization protocol of the original Chord algorithm uses two stabilization protocol of the original Chord algorithm uses two
operations: successor stabilization and finger stabilization. operations: successor stabilization and finger stabilization.
However, the Chord algorithm of RELOAD base defines two additional However, the Chord algorithm of RELOAD base defines two additional
stabilization components, as will be discussed below. stabilization components, as will be discussed below.
To increase robustness in the event of peer failures, each Chord peer To increase robustness in the event of peer failures, each Chord peer
maintains a successor list of size r, containing the peer's first r maintains a successor list of size r, containing the peer's first r
successors. The benefit of successor lists is that if each peer successors. The benefit of successor lists is that if each peer
fails independently with probability p, the probability that all r fails independently with probability p, the probability that all r
successors fail simultaneously is only p^r. successors fail simultaneously is only p^r.
The original Chord algorithm maintains only a single predecessor The original Chord algorithm maintains only a single predecessor
skipping to change at page 9, line 23 skipping to change at page 9, line 26
The original Chord algorithm maintains only a single predecessor The original Chord algorithm maintains only a single predecessor
pointer. However, multiple predecessor pointers (i.e., a predecessor pointer. However, multiple predecessor pointers (i.e., a predecessor
list) can be maintained to speed up recovery from predecessor list) can be maintained to speed up recovery from predecessor
failures. The routing table of a peer consists of the successor failures. The routing table of a peer consists of the successor
list, finger table, and predecessor list. list, finger table, and predecessor list.
5. Extending Chord-reload to Support Self-tuning 5. Extending Chord-reload to Support Self-tuning
This section describes how the mandatory-to-implement chord-reload This section describes how the mandatory-to-implement chord-reload
algorithm defined in RELOAD base [I-D.ietf-p2psip-base] can be algorithm defined in RELOAD base [RFC6940] can be extended to support
extended to support self-tuning. self-tuning.
The chord-reload algorithm supports both reactive and periodic The chord-reload algorithm supports both reactive and periodic
recovery strategies. When the self-tuning mechanisms defined in this recovery strategies. When the self-tuning mechanisms defined in this
document are used, the periodic recovery strategy MUST be used. document are used, the periodic recovery strategy MUST be used.
Further, chord-reload specifies that at least three predecessors and Further, chord-reload specifies that at least three predecessors and
three successors need to be maintained. When the self-tuning three successors need to be maintained. When the self-tuning
mechanisms are used, the appropriate sizes of the successor list and mechanisms are used, the appropriate sizes of the successor list and
predecessor list are determined in an adaptive fashion based on the predecessor list are determined in an adaptive fashion based on the
estimated network size, as will be described in Section 6. estimated network size, as will be described in Section 6.
skipping to change at page 9, line 51 skipping to change at page 10, line 8
stabilization. stabilization.
2. Refreshing the finger table. We refer to this as finger 2. Refreshing the finger table. We refer to this as finger
stabilization. stabilization.
3. Adjusting finger table size. 3. Adjusting finger table size.
4. Detecting partitioning. We refer to this as strong 4. Detecting partitioning. We refer to this as strong
stabilization. stabilization.
As specified in RELOAD base [I-D.ietf-p2psip-base], a peer sends As specified in RELOAD base [RFC6940], a peer sends periodic messages
periodic messages as part of the neighbor stabilization, finger as part of the neighbor stabilization, finger stabilization, and
stabilization, and strong stabilization routines. In neighbor strong stabilization routines. In neighbor stabilization, a peer
stabilization, a peer periodically sends an Update request to every periodically sends an Update request to every peer in its Connection
peer in its Connection Table. The default time is every ten minutes. Table. The default time is every ten minutes. In finger
In finger stabilization, a peer periodically searches for new peers stabilization, a peer periodically searches for new peers to include
to include in its finger table. This time defaults to one hour. in its finger table. This time defaults to one hour. This document
This document specifies how the neighbor stabilization and finger specifies how the neighbor stabilization and finger stabilization
stabilization intervals can be determined in an adaptive fashion intervals can be determined in an adaptive fashion based on the
based on the operating conditions of the overlay. The subsections operating conditions of the overlay. The subsections below describe
below describe how this document extends the four components of how this document extends the four components of stabilization.
stabilization.
5.1. Update Requests 5.1. Update Requests
As described in RELOAD base [I-D.ietf-p2psip-base], the neighbor and As described in RELOAD base [RFC6940], the neighbor and finger
finger stabilization procedures are implemented using Update stabilization procedures are implemented using Update requests.
requests. RELOAD base defines three types of Update requests: RELOAD base defines three types of Update requests: 'peer_ready',
'peer_ready', 'neighbors', and 'full'. Regardless of the type, all 'neighbors', and 'full'. Regardless of the type, all Update requests
Update requests include an 'uptime' field. Since the self-tuning include an 'uptime' field. Since the self-tuning extensions require
extensions require information on the uptimes of peers in the routing information on the uptimes of peers in the routing table, the sender
table, the sender of an Update request MUST include its current of an Update request MUST include its current uptime in seconds in
uptime in seconds in the 'uptime' field. the 'uptime' field.
When self-tuning is used, each peer decides independently the When self-tuning is used, each peer decides independently the
appropriate size for the successor list, predecessor list and finger appropriate size for the successor list, predecessor list and finger
table. Thus, the 'predecessors', 'successors', and 'fingers' fields table. Thus, the 'predecessors', 'successors', and 'fingers' fields
included in RELOAD Update requests are of variable length. As included in RELOAD Update requests are of variable length. As
specified in RELOAD [I-D.ietf-p2psip-base], variable length fields specified in RELOAD [RFC6940], variable length fields are on the wire
are on the wire preceded by length bytes. In the case of the preceded by length bytes. In the case of the successor list,
successor list, predecessor list, and finger table, there are two predecessor list, and finger table, there are two length bytes
length bytes (allowing lengths up to 2^16-1). The number of NodeId (allowing lengths up to 2^16-1). The number of NodeId structures
structures included in each field can be calculated based on the included in each field can be calculated based on the length bytes
length bytes since the size of a single NodeId structure is 16 bytes. since the size of a single NodeId structure is 16 bytes. If a peer
If a peer receives more entries than fit into its successor list, receives more entries than fit into its successor list, predecessor
predecessor list or finger table, the peer MUST ignore the extra list or finger table, the peer MUST ignore the extra entries. If a
entries. If a peer receives less entries than it currently has in peer receives less entries than it currently has in its own data
its own data structure, the peer MUST NOT drop the extra entries from structure, the peer MUST NOT drop the extra entries from its data
its data structure. structure.
5.2. Neighbor Stabilization 5.2. Neighbor Stabilization
In the neighbor stabilization operation of chord-reload, a peer In the neighbor stabilization operation of chord-reload, a peer
periodically sends an Update request to every peer in its Connection periodically sends an Update request to every peer in its Connection
Table. In a small, low-churn overlay, the amount of traffic this Table. In a small, low-churn overlay, the amount of traffic this
process generates is typically acceptable. However, in a large-scale process generates is typically acceptable. However, in a large-scale
overlay churning at a moderate or high churn rate, the traffic load overlay churning at a moderate or high churn rate, the traffic load
may no longer be acceptable since the size of the connection table is may no longer be acceptable since the size of the connection table is
large and the stabilization interval relatively short. The self- large and the stabilization interval relatively short. The self-
skipping to change at page 11, line 32 skipping to change at page 11, line 36
5.3. Finger Stabilization 5.3. Finger Stabilization
Chord-reload specifies two alternative methods for searching for new Chord-reload specifies two alternative methods for searching for new
peers to the finger table. Both of the alternatives can be used with peers to the finger table. Both of the alternatives can be used with
the self-tuning extensions defined in this document. the self-tuning extensions defined in this document.
Immediately after a new peer has been added to the finger table, a Immediately after a new peer has been added to the finger table, a
Probe request MUST be sent to the new peer to fetch its uptime. The Probe request MUST be sent to the new peer to fetch its uptime. The
requested_info field of the Probe request MUST be set to contain the requested_info field of the Probe request MUST be set to contain the
ProbeInformationType 'uptime' defined in RELOAD base ProbeInformationType 'uptime' defined in RELOAD base [RFC6940].
[I-D.ietf-p2psip-base].
5.4. Adjusting Finger Table Size 5.4. Adjusting Finger Table Size
The chord-reload algorithm defines how a peer can make sure that the The chord-reload algorithm defines how a peer can make sure that the
finger table is appropriately sized to allow for efficient routing. finger table is appropriately sized to allow for efficient routing.
Since the self-tuning mechanisms specified in this document produce a Since the self-tuning mechanisms specified in this document produce a
network size estimate, this estimate can be directly used to network size estimate, this estimate can be directly used to
calculate the optimal size for the finger table. This mechanism MUST calculate the optimal size for the finger table. This mechanism MUST
be used instead of the one specified by chord-reload. A peer MUST be used instead of the one specified by chord-reload. A peer MUST
use the network size estimate to determine whether it needs to adjust use the network size estimate to determine whether it needs to adjust
the size of its finger table each time when the stabilization timer the size of its finger table each time when the stabilization timer
fires. The way this is done is explained in Section 6.2. fires. The way this is done is explained in Section 6.2.
5.5. Detecting Partitioning 5.5. Detecting Partitioning
This document does not require any changes to the mechanism chord- This document does not require any changes to the mechanism chord-
reload uses to detect network partitioning. reload uses to detect network partitioning.
5.6. Leaving the Overlay 5.6. Leaving the Overlay
As specified in RELOAD base [I-D.ietf-p2psip-base], a leaving peer As specified in RELOAD base [RFC6940], a leaving peer SHOULD send a
SHOULD send a Leave request to all members of its neighbor table Leave request to all members of its neighbor table prior to leaving
prior to leaving the overlay. The overlay_specific_data field MUST the overlay. The overlay_specific_data field MUST contain the
contain the ChordLeaveData structure. The Leave requests that are ChordLeaveData structure. The Leave requests that are sent to
sent to successors MUST contain the predecessor list of the leaving successors MUST contain the predecessor list of the leaving peer.
peer. The Leave requests that are sent to the predecessors MUST The Leave requests that are sent to the predecessors MUST contain the
contain the successor list of the leaving peer. If a given successor successor list of the leaving peer. If a given successor can
can identify better predecessors than are already included in its identify better predecessors than are already included in its
predecessor lists by investigating the predecessor list it receives predecessor lists by investigating the predecessor list it receives
from the leaving peer, it MUST send Attach requests to them. from the leaving peer, it MUST send Attach requests to them.
Similarly, if a given predecessor identifies better successors by Similarly, if a given predecessor identifies better successors by
investigating the successor list it receives from the leaving peer, investigating the successor list it receives from the leaving peer,
it MUST send Attach requests to them. it MUST send Attach requests to them.
6. Self-tuning Chord Parameters 6. Self-tuning Chord Parameters
This section specifies how to determine an appropriate stabilization This section specifies how to determine an appropriate stabilization
rate and routing table size in an adaptive fashion. The proposed rate and routing table size in an adaptive fashion. The proposed
skipping to change at page 13, line 15 skipping to change at page 13, line 18
2^numBitsInNodeId 2^numBitsInNodeId
N = ------------------- N = -------------------
d d
This estimate has been found to be accurate within 15% of the real This estimate has been found to be accurate within 15% of the real
network size [ghinita2006]. Of course, the size of the neighborhood network size [ghinita2006]. Of course, the size of the neighborhood
set affects the accuracy of the estimate. set affects the accuracy of the estimate.
During the join process, a joining peer fills its routing table by During the join process, a joining peer fills its routing table by
sending a series of Ping and Attach requests, as specified in RELOAD sending a series of Ping and Attach requests, as specified in RELOAD
base [I-D.ietf-p2psip-base]. Thus, a joining peer immediately has base [RFC6940]. Thus, a joining peer immediately has enough
enough information at its disposal to calculate an estimate of the information at its disposal to calculate an estimate of the network
network size. size.
6.2. Determining Routing Table Size 6.2. Determining Routing Table Size
As specified in RELOAD base, the finger table must contain at least As specified in RELOAD base, the finger table must contain at least
16 entries. When the self-tuning mechanisms are used, the size of 16 entries. When the self-tuning mechanisms are used, the size of
the finger table MUST be set to max(log2(N), 16) using the estimated the finger table MUST be set to max(ceiling(log2(N)), 16) using the
network size N. estimated network size N.
The size of the successor list MUST be set to log2(N). An The size of the successor list MUST be set to ceiling(log2(N)). An
implementation MAY place a lower limit on the size of the successor implementation MAY place a lower limit on the size of the successor
list. As an example, the implementation might require the size of list. As an example, the implementation might require the size of
the successor list to be always at least three. the successor list to be always at least three.
A peer MAY choose to maintain a fixed-size predecessor list with only A peer MAY choose to maintain a fixed-size predecessor list with only
three entries as specified in RELOAD base. However, it is three entries as specified in RELOAD base. However, it is
RECOMMENDED that a peer maintains log2(N) predecessors. RECOMMENDED that a peer maintains ceiling(log2(N)) predecessors.
6.3. Estimating Failure Rate 6.3. Estimating Failure Rate
A typical approach is to assume that peers join the overlay according A typical approach is to assume that peers join the overlay according
to a Poisson process with rate L and leave according to a Poisson to a Poisson process with rate L and leave according to a Poisson
process with rate parameter U [mahajan2003]. The value of U can be process with rate parameter U [mahajan2003]. The value of U can be
estimated using peer failures in the finger table and neighborhood estimated using peer failures in the finger table and neighborhood
set [mahajan2003]. If peers fail with rate U, a peer with M unique set [mahajan2003]. If peers fail with rate U, a peer with M unique
peer identifiers in its routing table should observe K failures in peer identifiers in its routing table should observe K failures in
time K/(M*U). Every peer in the overlay MUST maintain a history of time K/(M*U). Every peer in the overlay MUST maintain a history of
skipping to change at page 15, line 15 skipping to change at page 15, line 18
6.4. Estimating Join Rate 6.4. Estimating Join Rate
Reference [ghinita2006] proposes that a peer can estimate the join Reference [ghinita2006] proposes that a peer can estimate the join
rate based on the uptime of the peers in its routing table. An rate based on the uptime of the peers in its routing table. An
increase in peer join rate will be reflected by a decrease in the increase in peer join rate will be reflected by a decrease in the
average age of peers in the routing table. Thus, each peer MUST average age of peers in the routing table. Thus, each peer MUST
maintain an array of the ages of the peers in its routing table maintain an array of the ages of the peers in its routing table
sorted in increasing order. Using this information, an estimate of sorted in increasing order. Using this information, an estimate of
the global peer join rate L MUST be calculated as: the global peer join rate L MUST be calculated as:
N N
L = ---------------, L = ----------------------,
Ages[rsize/2] Ages[floor(rsize/2)]
where Ages is an array containing the ages of the peers in the where Ages is an array containing the ages of the peers in the
routing table sorted in increasing order and rsize is the size of the routing table sorted in increasing order and rsize is the size of the
routing table. It has been shown that the estimate obtained by using routing table. It has been shown that the estimate obtained by using
this method is accurate within 22% of the real join rate this method is accurate within 22% of the real join rate
[ghinita2006]. Of course, the size of the routing table affects the [ghinita2006]. Of course, the size of the routing table affects the
accuracy. accuracy.
In order for this mechanism to work, peers need to exchange In order for this mechanism to work, peers need to exchange
information about the time they have been present in the overlay. information about the time they have been present in the overlay.
skipping to change at page 16, line 38 skipping to change at page 16, line 42
the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2), that is, the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2), that is,
the value 10628. the value 10628.
The 'type' field of the MessageExtension structure MUST be set to The 'type' field of the MessageExtension structure MUST be set to
contain the value 'self_tuning_data'. The 'critical' field of the contain the value 'self_tuning_data'. The 'critical' field of the
structure MUST be set to False. structure MUST be set to False.
A peer MUST store all estimates it receives in Probe requests and A peer MUST store all estimates it receives in Probe requests and
answers during a stabilization interval. When the stabilization answers during a stabilization interval. When the stabilization
timer fires, the peer MUST calculate the estimates to be used during timer fires, the peer MUST calculate the estimates to be used during
the next stabilization interval by taking the 75th percentile of a the next stabilization interval by taking the 75th percentile (i.e.,
data set containing its own estimate and the received estimates. third quartile) of a data set containing its own estimate and the
received estimates.
6.6. Calculating the Stabilization Interval 6.6. Calculating the Stabilization Interval
According to [liben-nowell2002], a Chord network in a ring-like state According to [liben-nowell2002], a Chord network in a ring-like state
remains in a ring-like state as long as peers send Omega(log2(N))^2 remains in a ring-like state as long as peers send
messages before N new peers join or N/2 peers fail. We can use the Omega(square(log(N))) messages before N new peers join or N/2 peers
estimate of peer failure rate, U, to calculate the time Tf in which N fail. We can use the estimate of peer failure rate, U, to calculate
/2 peers fail: the time Tf in which N/2 peers fail:
1 1
Tf = ------ Tf = ------
2*U 2*U
Based on this estimate, a stabilization interval Tstab-1 MUST be Based on this estimate, a stabilization interval Tstab-1 MUST be
calculated as: calculated as:
Tf Tf
Tstab-1 = ----------- Tstab-1 = -----------------
log2^2(N) square(log2(N))
On the other hand, the estimated join rate L can be used to calculate On the other hand, the estimated join rate L can be used to calculate
the time in which N new peers join the overlay. Based on the the time in which N new peers join the overlay. Based on the
estimate of L, a stabilization interval Tstab-2 MUST be calculated estimate of L, a stabilization interval Tstab-2 MUST be calculated
as: as:
N N
Tstab-2 = --------------- Tstab-2 = ---------------------
L * log2^2(N) L * square(log2(N))
Finally, the actual stabilization interval Tstab that MUST be used Finally, the actual stabilization interval Tstab that MUST be used
can be obtained by taking the minimum of Tstab-1 and Tstab-2. can be obtained by taking the minimum of Tstab-1 and Tstab-2.
The results obtained in [maenpaa2009] indicate that making the The results obtained in [maenpaa2009] indicate that making the
stabilization interval too small has the effect of making the overlay stabilization interval too small has the effect of making the overlay
less stable (e.g., in terms of detected loops and path failures). less stable (e.g., in terms of detected loops and path failures).
Thus, a lower limit should be used for the stabilization period. Thus, a lower limit should be used for the stabilization period.
Based on the results in [maenpaa2009], a lower limit of 15s is Based on the results in [maenpaa2009], a lower limit of 15s is
RECOMMENDED, since using a stabilization period smaller than this RECOMMENDED, since using a stabilization period smaller than this
skipping to change at page 18, line 4 skipping to change at page 18, line 8
adding one new element, "number-of-peers-to-probe", inside each adding one new element, "number-of-peers-to-probe", inside each
"configuration" element. "configuration" element.
self-tuning:number-of-peers-to-probe: The number of fingers to which self-tuning:number-of-peers-to-probe: The number of fingers to which
Probe requests are sent to obtain their network size, join rate, Probe requests are sent to obtain their network size, join rate,
and leave rate estimates. The default value is 4. and leave rate estimates. The default value is 4.
This new element is formally defined as follows: This new element is formally defined as follows:
namespace self-tuning = "urn:ietf:params:xml:ns:p2p:self-tuning" namespace self-tuning = "urn:ietf:params:xml:ns:p2p:self-tuning"
parameter &= element self-tuning:number-of-peers-to-probe { parameter &= element self-tuning:number-of-peers-to-probe {
xsd:unsignedInt } xsd:unsignedInt }
This namespace is added into the <mandatory-extension> element in the This namespace is added into the <mandatory-extension> element in the
overlay configuration file. overlay configuration file.
8. Security Considerations 8. Security Considerations
In the same way as malicious or compromised peers implementing the In the same way as malicious or compromised peers implementing the
RELOAD base protocol [I-D.ietf-p2psip-base] can advertise false RELOAD base protocol [RFC6940] can advertise false network metrics or
network metrics or distribute false routing table information for distribute false routing table information for instance in RELOAD
instance in RELOAD Update messages, malicious peers implementing this Update messages, malicious peers implementing this specification may
specification may share false join rate, leave rate, and network size share false join rate, leave rate, and network size estimates. For
estimates. For such attacks, the same security concerns apply as in such attacks, the same security concerns apply as in the RELOAD base
the RELOAD base specification. In addition, as long as the amount of specification. In addition, as long as the amount of malicious peers
malicious peers in the overlay remains modest, the statistical in the overlay remains modest, the statistical mechanisms applied in
mechanisms applied in Section 6.5 (i.e., the use of 75th percentiles) Section 6.5 (i.e., the use of 75th percentiles) to process the shared
to process the shared estimates a peer obtains help ensuring that estimates a peer obtains help ensuring that estimates that are
estimates that are clearly different from (i.e., larger or smaller clearly different from (i.e., larger or smaller than) other received
than) other received estimates will not significantly influence the estimates will not significantly influence the process of adapting
process of adapting the stabilization interval and routing table the stabilization interval and routing table size. However, it
size. should be noted that if an attacker is able to impersonate a high
number of other peers in the overlay in strategic locations, it may
be able to send a high enough number of false estimates to a victim
and therefore influence the victim's choice of a stabilization
interval.
9. IANA Considerations 9. IANA Considerations
9.1. Message Extensions 9.1. Message Extensions
This document introduces one additional extension to the "RELOAD This document introduces one additional extension to the "RELOAD
Extensions" Registry: Extensions" Registry:
+------------------+-------+---------------+ +------------------+-------+---------------+
| Extension Name | Code | Specification | | Extension Name | Code | Specification |
skipping to change at page 18, line 49 skipping to change at page 19, line 12
The contents of the extension are defined in Section 6.5. The contents of the extension are defined in Section 6.5.
Note to RFC Editor: please replace AAAA with the RFC number for this Note to RFC Editor: please replace AAAA with the RFC number for this
specification. specification.
10. Acknowledgments 10. Acknowledgments
The authors would like to thank Jani Hautakorpi for his contributions The authors would like to thank Jani Hautakorpi for his contributions
to the document. The authors would also like to thank Carlos to the document. The authors would also like to thank Carlos
Bernardos for his comments on the document. Bernardos and Martin Durst for their comments on the document.
11. References 11. References
11.1. Normative References 11.1. Normative References
[I-D.ietf-p2psip-base]
Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and
H. Schulzrinne, "REsource LOcation And Discovery (RELOAD)
Base Protocol", draft-ietf-p2psip-base-26 (work in
progress), February 2013.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC5245] Rosenberg, J., "Interactive Connectivity Establishment [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment
(ICE): A Protocol for Network Address Translator (NAT) (ICE): A Protocol for Network Address Translator (NAT)
Traversal for Offer/Answer Protocols", RFC 5245, April Traversal for Offer/Answer Protocols", RFC 5245, April
2010. 2010.
[RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
"Session Traversal Utilities for NAT (STUN)", RFC 5389, "Session Traversal Utilities for NAT (STUN)", RFC 5389,
October 2008. October 2008.
[RFC6940] Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and
H. Schulzrinne, "REsource LOcation And Discovery (RELOAD)
Base Protocol", RFC 6940, January 2014.
11.2. Informative References 11.2. Informative References
[CAN] Ratnasamy, S., Francis, P., Handley, M., Karp, R., and S. [CAN] Ratnasamy, S., Francis, P., Handley, M., Karp, R., and S.
Schenker, "A Scalable Content-Addressable Network", In Schenker, "A Scalable Content-Addressable Network", In
Proceedings of the 2001 Conference on Applications, Proceedings of the 2001 Conference on Applications,
Technologies, Architectures and Protocols for Computer Technologies, Architectures and Protocols for Computer
Communications pp. 161-172, August 2001. Communications pp. 161-172, August 2001.
[Chord] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., [Chord] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.,
Kaashoek, M., Dabek, F., and H. Balakrishnan, "Chord: A Kaashoek, M., Dabek, F., and H. Balakrishnan, "Chord: A
skipping to change at page 21, line 23 skipping to change at page 21, line 30
Mahajan, R., Castro, M., and A. Rowstron, "Controlling the Mahajan, R., Castro, M., and A. Rowstron, "Controlling the
Cost of Reliability in Peer-to-Peer Overlays", In Cost of Reliability in Peer-to-Peer Overlays", In
Proceedings of the 2nd International Workshop on Peer-to- Proceedings of the 2nd International Workshop on Peer-to-
Peer Systems (IPTPS) pp. 21-32, February 2003. Peer Systems (IPTPS) pp. 21-32, February 2003.
[rhea2004] [rhea2004]
Rhea, S., Geels, D., Roscoe, T., and J. Kubiatowicz, Rhea, S., Geels, D., Roscoe, T., and J. Kubiatowicz,
"Handling Churn in a DHT", In Proceedings of the USENIX "Handling Churn in a DHT", In Proceedings of the USENIX
Annual Technical Conference pp. 127-140, June 2004. Annual Technical Conference pp. 127-140, June 2004.
[weiss1998]
Weiss, M., "Data Structures and Algorithm Analysis in
C++", Addison-Wesley Longman Publishin Co., Inc. 2nd
Edition, ISBN:0201361221, 1998.
Authors' Addresses Authors' Addresses
Jouni Maenpaa Jouni Maenpaa
Ericsson Ericsson
Hirsalantie 11 Hirsalantie 11
Jorvas 02420 Jorvas 02420
Finland Finland
Email: Jouni.Maenpaa@ericsson.com Email: Jouni.Maenpaa@ericsson.com
 End of changes. 36 change blocks. 
131 lines changed or deleted 144 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/