--- 1/draft-ietf-p2psip-self-tuning-00.txt 2010-03-01 10:11:06.000000000 +0100 +++ 2/draft-ietf-p2psip-self-tuning-01.txt 2010-03-01 10:11:06.000000000 +0100 @@ -1,20 +1,20 @@ P2PSIP Working Group J. Maenpaa Internet-Draft G. Camarillo Intended status: Standards Track J. Hautakorpi -Expires: June 11, 2010 Ericsson - December 8, 2009 +Expires: September 2, 2010 Ericsson + March 1, 2010 A Self-tuning Distributed Hash Table (DHT) for REsource LOcation And Discovery (RELOAD) - draft-ietf-p2psip-self-tuning-00.txt + draft-ietf-p2psip-self-tuning-01.txt Abstract REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P) signaling protocol that provides an overlay network service. Peers in a RELOAD overlay network collectively run an overlay algorithm to organize the overlay, and to store and retrieve data. This document describes how the default topology plugin of RELOAD can be extended to support self-tuning, that is, to adapt to changing operating conditions such as churn and network size. @@ -33,25 +33,25 @@ and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. - This Internet-Draft will expire on June 11, 2010. + This Internet-Draft will expire on September 2, 2010. Copyright Notice - Copyright (c) 2009 IETF Trust and the persons identified as the + Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as @@ -60,37 +60,38 @@ Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Introduction to Stabilization in DHTs . . . . . . . . . . . . 5 3.1. Reactive vs. Periodic Stabilization . . . . . . . . . . . 5 3.2. Configuring Periodic Stabilization . . . . . . . . . . . . 6 3.3. Adaptive Stabilization . . . . . . . . . . . . . . . . . . 7 4. Introduction to Chord . . . . . . . . . . . . . . . . . . . . 7 5. Extending Chord-reload to Support Self-tuning . . . . . . . . 9 - 5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 10 + 5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 9 5.2. Neighbor Stabilization . . . . . . . . . . . . . . . . . . 10 5.3. Finger Stabilization . . . . . . . . . . . . . . . . . . . 11 5.4. Adjusting Finger Table Size . . . . . . . . . . . . . . . 11 - 5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . . 12 + 5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . . 11 5.6. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 12 - 5.6.1. Contents of the Leave Message . . . . . . . . . . . . 12 - 6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . . 13 - 6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 13 - 6.2. Determining Routing Table Size . . . . . . . . . . . . . . 14 - 6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 14 - 6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . . 15 - 6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . . 16 - 6.5. Calculating the Stabilization Interval . . . . . . . . . . 16 + 6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . . 12 + 6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 12 + 6.2. Determining Routing Table Size . . . . . . . . . . . . . . 13 + 6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 13 + 6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . . 14 + 6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . . 15 + 6.5. Estimate Sharing . . . . . . . . . . . . . . . . . . . . . 15 + 6.6. Calculating the Stabilization Interval . . . . . . . . . . 16 7. Overlay Configuration Document Extension . . . . . . . . . . . 17 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 + 9.1. Message Extensions . . . . . . . . . . . . . . . . . . . . 18 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 10.1. Normative References . . . . . . . . . . . . . . . . . . . 18 10.2. Informative References . . . . . . . . . . . . . . . . . . 18 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 1. Introduction REsource LOcation And Discovery (RELOAD) [I-D.ietf-p2psip-base] is a peer-to-peer signaling protocol that can be used to maintain an overlay network, and to store data in and retrieve data from the @@ -171,41 +172,33 @@ O(g(n)): Informally, saying that some equation f(n) = O(g(n)) means that f(n) is less than some constant multiple of g(n). Omega(g(n)): Informally, saying that some equation f(n) = Omega(g(n)) means that f(n) is more than some constant multiple of g(n). Predecessor List: A data structure containing the predecessors of a peer on the Chord ring. - Responsible Peer: The peer storing the original copy of a resource. - In Chord, this is the first peer whose identifier is equal to or - follows the identifier of the resource on the Chord ring. Also - the term "Root Node" can be used for this concept. - Routing Table: The set of peers that a node can use to route overlay messages. The routing table consists of the finger table, successor list and predecessor list. Successor List: A data structure containing the first r successors of a peer on the Chord ring. 3. Introduction to Stabilization in DHTs - DHT-based peer-to-peer overlay networks are self-organizing, scalable - and reliable. However, these features come at a cost: peers in the - overlay network need to consume network bandwidth to maintain routing - state. DHTs use stabilization routines to counter the undesirable - effects of churn on routing. The purpose of stabilization is to keep - the routing information of each peer in the overlay consistent with - the constantly changing overlay topology. There are two alternative + DHTs use stabilization routines to counter the undesirable effects of + churn on routing. The purpose of stabilization is to keep the + routing information of each peer in the overlay consistent with the + constantly changing overlay topology. There are two alternative approaches to stabilization: periodic and reactive [rhea2004]. Periodic stabilization can either use a fixed stabilization rate or calculate the stabilization rate in an adaptive fashion. 3.1. Reactive vs. Periodic Stabilization In reactive stabilization, a peer reacts to the loss of a peer in its neighborhood set or to the appearance of a new peer that should be added to its neighborhood set by sending a copy of its neighbor table to all peers in the neighborhood set. Periodic recovery, in @@ -257,21 +250,21 @@ problem is setting the parameters so that the overlay achieves the desired reliability and performance even in challenging conditions, such as under heavy churn. This naturally results in high cost during periods when the churn level is lower than expected, or alternatively, poor performance or even network partitioning in worse than expected conditions. In addition to selecting an appropriate stabilization interval, regardless of whether periodic stabilization is used or not, an appropriate size needs to be selected for the neighborhood set and - for the routing table. + for the finger table. The current approach is to configure overlays statically. This works in situations where perfect information about the future is available. In situations where the operating conditions of the network are known in advance and remain static throughout the lifetime of the system, it is possible to choose fixed optimal values for parameters such as stabilization rate, neighborhood set size and routing table size. However, if the operating conditions (e.g., the size of the overlay and its churn rate) do not remain static but evolve with time, it is not possible to achieve both a low lookup @@ -417,28 +410,28 @@ In finger stabilization, a peer periodically searches for new peers to include in its finger table. This time defaults to one hour. This document specifies how the neighbor stabilization and finger stabilization intervals can be determined in an adaptive fashion based on the operating conditions of the overlay. The subsections below describe how this document extends the four components of stabilization. 5.1. Update Requests - The neighbor and finger stabilization procedures are implemented - using the Update request defined in RELOAD base - [I-D.ietf-p2psip-base]. RELOAD base defines three types of Update - requests: 'peer_ready', 'neighbors', and 'full'. Regardless of the - type, all Update requests include an 'uptime' field. Since the self- - tuning extensions require information on the uptimes of peers in the - routing table, the sender of an Update request MUST include its - current uptime in seconds in the 'uptime' field. + As described in RELOAD base [I-D.ietf-p2psip-base], the neighbor and + finger stabilization procedures are implemented using Update + requests. RELOAD base defines three types of Update requests: + 'peer_ready', 'neighbors', and 'full'. Regardless of the type, all + Update requests include an 'uptime' field. Since the self-tuning + extensions require information on the uptimes of peers in the routing + table, the sender of an Update request MUST include its current + uptime in seconds in the 'uptime' field. When self-tuning is used, each peer decides independently the appropriate size for the successor list, predecessor list and finger table. Thus, the 'predecessors', 'successors', and 'fingers' fields included in RELOAD Update requests are of variable length. As specified in RELOAD [I-D.ietf-p2psip-base], variable length fields are on the wire preceded by length bytes. In the case of the successor list, predecessor list, and finger table, there are two length bytes (allowing lengths up to 2^16-1). The number of NodeId structures included in each field can be calculated based on the @@ -453,23 +446,22 @@ In the neighbor stabilization operation of chord-reload, a peer periodically sends an Update request to every peer in its Connection Table. In a small, low-churn overlay, the amount of traffic this process generates is typically acceptable. However, in a large-scale overlay churning at a moderate or high churn rate, the traffic load may no longer be acceptable since the size of the connection table is large and the stabilization interval relatively short. The self- tuning mechanisms described in this document are especially designed for overlays of the latter type. Therefore, when the self-tuning - mechanisms are used, each peer MUST only send a periodic Update - request to its first predecessor and first successor on the Chord - ring. + mechanisms are used, each peer MUST send a periodic Update request + only to its first predecessor and first successor on the Chord ring. The neighbor stabilization routine MUST be executed when the stabilization timer fires. To begin the neighbor stabilization routine, a peer MUST send an Update request to its first successor and its first predecessor. The type of the Update request MUST be 'neighbors'. The Update request MUST include the successor and predecessor lists of the sender. If a peer receiving such an Update request learns from the predecessor and successor lists included in the request that new peers can be included in its neighborhood set, it MUST send Attach requests to the new peers. @@ -515,77 +507,34 @@ the size of its finger table each time when the stabilization timer fires. The way this is done is explained in Section 6.2. 5.5. Detecting Partitioning This document does not require any changes to the mechanism chord- reload uses to detect network partitioning. 5.6. Leaving the Overlay - This document extends the Leave request defined in RELOAD base - [I-D.ietf-p2psip-base]. As specified in RELOAD base, a leaving peer + As specified in RELOAD base [I-D.ietf-p2psip-base], a leaving peer SHOULD send a Leave request to all members of its neighbor table - prior to leaving the overlay. When the self-tuning extensions are - used, the Leave requests sent to the first predecessor and first - successor contain additional data. The Leave request that is sent to - the first successor MUST contain the predecessor list of the leaving - peer. The Leave request that is sent to the first predecessor MUST - contain the successor list of the leaving peer. If the first - successor can identify better predecessors than are already included - in its predecessor list by investigating the predecessor list it - receives from the leaving peer, it MUST send Attach requests to them. - Similarly, if the first predecessor identifies better successors by + prior to leaving the overlay. The overlay_specific_data field MUST + contain the ChordLeaveData structure. The Leave requests that are + sent to successors MUST contain the predecessor list of the leaving + peer. The Leave requests that are sent to the predecessors MUST + contain the successor list of the leaving peer. If a given successor + can identify better predecessors than are already included in its + predecessor lists by investigating the predecessor list it receives + from the leaving peer, it MUST send Attach requests to them. + Similarly, if a given predecessor identifies better successors by investigating the successor list it receives from the leaving peer, it MUST send Attach requests to them. -5.6.1. Contents of the Leave Message - - The Leave request defined in RELOAD base [I-D.ietf-p2psip-base] has a - field for overlay specific data. When the self-tuning extensions are - being used, this overlay_specific_data field MUST contain the - ChordLeaveData structure defined below: - - enum { reserved (0), - from_succ(1), from_pred(2), (255) } - ChordLeaveType; - - struct { - ChordLeaveType type; - - select(type) { - case from_succ: - NodeId successors<0..2^16-1>; - case from_pred: - NodeId predecessors<0..2^16-1>; - }; - } ChordLeaveData; - - The 'type' field indicates whether the Leave request was sent by a - predecessor or a successor of the recipient: - - from_succ - The Leave request was sent by a successor. - - from_pred - The Leave request was sent by a predecessor. - - If the type of the request is 'from_succ', the contents will be: - - successors - The sender's successor list. - - If the type of the request is 'from_pred', the contents will be: - - predecessors - The sender's predecessor list. - 6. Self-tuning Chord Parameters This section specifies how to determine an appropriate stabilization rate and routing table size in an adaptive fashion. The proposed mechanism is based on [mahajan2003], [liben-nowell2002], and [ghinita2006]. To calculate an appropriate stabilization rate, the values of three parameters MUST be estimated: overlay size N, failure rate U, and join rate L. To calculate an appropriate routing table size, the estimated network size N can be used. Peers in the overlay MUST re-calculate the values of the parameters to self-tune the @@ -672,34 +621,36 @@ may cause a peer to overreact each time a new failure occurs. In [ghinita2006], K is set 25% of the routing table size. Use of this approach is RECOMMENDED. 6.3.1. Detecting Failures A new failure MUST be inserted to the failure history in the following cases: 1. A Leave request is received from a neigbhor. - 2. No data has been received from a peer in the Connection Table for - a specified amount of time. RELOAD mandates the use of STUN - [RFC5389] for keepalives. STUN keepalives take the form of STUN - Binding Indication transactions. As specified in ICE - [I-D.ietf-mmusic-ice], a peer sends a STUN Binding Indication if - there has been no packet sent on a connection for Tr seconds. Tr - is configurable and has a default of 15 seconds. Although STUN - Binding Indications do not generate a response, the fact that a - peer has failed can be learned from the lack of packets (Binding - Indications or application protocol packets) received from the - peer. A peer MUST be considered to have failed if no packets - have been received from it for the past max-silent-periods*Tr - seconds. max-silent-periods is a new element in the overlay - configuration document. The default value is three. + 2. A peer fails to reply to a Ping request sent in the situation + explained below. If no packets have been received on a + connection during the past 2*Tr seconds (where Tr is the + inactivity timer defined by ICE [I-D.ietf-mmusic-ice]), a RELOAD + Ping request MUST be sent to the remote peer. RELOAD mandates + the use of STUN [RFC5389] for keepalives. STUN keepalives take + the form of STUN Binding Indication transactions. As specified + in ICE [I-D.ietf-mmusic-ice], a peer sends a STUN Binding + Indication if there has been no packet sent on a connection for + Tr seconds. Tr is configurable and has a default of 15 seconds. + Although STUN Binding Indications do not generate a response, the + fact that a peer has failed can be learned from the lack of + packets (Binding Indications or application protocol packets) + received from the peer. If the remote peer fails to reply to the + Ping request, the sender MUST consider the remote peer to have + failed. As an alternative to relying on STUN keepalives to detect peer failure, a peer could send additional, frequent RELOAD messages to every peer in its Connection Table. These messages could be Update requests, in which case they would serve two purposes: detecting peer failure and stabilization. However, as the cost of this approach can be very high in terms of bandwidth consumption and traffic load, especially in large-scale overlays experiencing churn, its use is NOT RECOMMENDED. @@ -730,21 +681,70 @@ In order for this mechanism to work, peers need to exchange information about the time they have been present in the overlay. Peers receive the uptimes of their successors and predecessors during the stabilization operations since all Update requests carry uptime values. A joining peer learns the uptime of the admitting peer since it receives an Update from the admitting peer during the join procedure. Peers learn the uptimes of new fingers since they can fetch the uptime using a Probe request after having attached to the new finger. -6.5. Calculating the Stabilization Interval +6.5. Estimate Sharing + + To improve the accuracy of network size, join rate, and leave rate + estimates, peers MUST share their estimates. When the stabilization + timer fires, a peer MUST select number-of-peers-to-probe random peers + from its finger table and send each of them a Probe request. The + targets of Probe requests are selected from the finger table rather + than from the neighbor table since neighbors are likely to make + similar errors when calculating their estimates. number-of-peers-to- + probe is a new element in the overlay configuration document. It is + defined in Section 7 and has a default value of 4. Both the Probe + request and the answer returned by the target peer MUST contain a new + message extension whose MessageExtensionType is 'self_tuning_data'. + This extension type is defined in Section 9.1. The + extension_contents field of the MessageExtension structure MUST + contain a SelfTuningData structure: + + struct { + uint32 network_size; + uint32 join_rate; + uint32 leave_rate; + } SelfTuningData; + + The contents of the SelfTuningData structure are as follows: + + network_size + The latest network size estimate calculated by the sender. + + join_rate + The latest join rate estimate calculated by the sender. + + leave_rate + The latest leave rate estimate calculated by the sender. + + The join and leave rates are expressed as joins or failures per 24 + hours. As an example, if the global join rate estimate a peer has + calculated is 0.123 peers/s, it would include in the join_rate field + the value 10627 (24*60*60*0.123 = 10627.2). + + The 'type' field of the MessageExtension structure MUST be set to + contain the value 'self_tuning_data'. The 'critical' field of the + structure MUST be set to False. + + A peer MUST store all estimates it receives in Probe requests and + answers during a stabilization interval. When the stabilization + timer fires, the peer MUST calculate the estimates to be used during + the next stabilization interval by taking the 75th percentile of a + data set containing its own estimate and the received estimates. + +6.6. Calculating the Stabilization Interval According to [liben-nowell2002], a Chord network in a ring-like state remains in a ring-like state as long as peers send Omega(log2^2(N)) messages before N new peers join or N/2 peers fail. We can use the estimate of peer failure rate, U, to calculate the time Tf in which N/2 peers fail: 1 Tf = ------ 2*U @@ -772,52 +772,61 @@ stabilization interval too small has the effect of making the overlay less stable (e.g., in terms of detected loops and path failures). Thus, a lower limit should be used for the stabilization period. Based on the results in [maenpaa2009], a lower limit of 15s is RECOMMENDED, since using a stabilization period smaller than this will with a high probability cause too much traffic in the overlay. 7. Overlay Configuration Document Extension This document extends the RELOAD overlay configuration document by - adding one new element, "max-silent-periods", inside each + adding one new element, "number-of-peers-to-probe", inside each "configuration" element. - max-silent-periods: A peer MUST be considered to have failed if no - packets have been received from it for the past max-silent- - periods*Tr seconds (Tr is defined in [I-D.ietf-mmusic-ice]). The - default value of the max-silent-periods element is 3. + number-of-peers-to-probe: The number of fingers to which Probe + requests are sent to obtain their network size, join rate, and + leave rate estimates. The default value is 4. 8. Security Considerations There are no new security considerations introduced in this document. The security considerations of RELOAD [I-D.ietf-p2psip-base] apply. 9. IANA Considerations +9.1. Message Extensions - This document has no actions for IANA. + This document introduces one additional extension to the "RELOAD + Extensions" Registry: + + +------------------+-------+---------------+ + | Extension Name | Code | Specification | + +------------------+-------+---------------+ + | self_tuning_data | 1 | RFC-AAAA | + +------------------+-------+---------------+ + + The contents of the extension are defined in Section 6.5. 10. References 10.1. Normative References [I-D.ietf-mmusic-ice] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", draft-ietf-mmusic-ice-19 (work in progress), October 2007. [I-D.ietf-p2psip-base] Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) - Base Protocol", draft-ietf-p2psip-base-06 (work in - progress), November 2009. + Base Protocol", draft-ietf-p2psip-base-07 (work in + progress), February 2010. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, "Session Traversal Utilities for NAT (STUN)", RFC 5389, October 2008. 10.2. Informative References