[Docs] [txt|pdf|xml] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]

Versions: (draft-cel-nfsv4-rfc5666-implementation-experience) 00 01 02 03

NFSv4                                                           C. Lever
Internet-Draft                                                    Oracle
Intended status: Informational                          November 2, 2015
Expires: May 5, 2016


          RPC-over-RDMA Version One Implementation Experience
         draft-ietf-nfsv4-rfc5666-implementation-experience-00

Abstract

   This document details experiences and challenges implementing the
   RPC-over-RDMA Version One protocol.  Specification changes are
   recommended to address avoidable interoperability failures.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 5, 2016.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.





Lever                      Expires May 5, 2016                  [Page 1]


Internet-Draft     RFC 5666 Implementation Experience      November 2015


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
     1.2.  Purpose Of This Document  . . . . . . . . . . . . . . . .   3
     1.3.  Updating RFC 5666 . . . . . . . . . . . . . . . . . . . .   4
   2.  RPC-Over-RDMA Essentials  . . . . . . . . . . . . . . . . . .   5
     2.1.  Arguments And Results . . . . . . . . . . . . . . . . . .   5
     2.2.  Remote Direct Memory Access . . . . . . . . . . . . . . .   5
       2.2.1.  Direct Data Placement . . . . . . . . . . . . . . . .   6
       2.2.2.  Channel Operation . . . . . . . . . . . . . . . . . .   6
       2.2.3.  Explicit RDMA Operation . . . . . . . . . . . . . . .   7
     2.3.  Transfer Models . . . . . . . . . . . . . . . . . . . . .   7
       2.3.1.  Read-Read . . . . . . . . . . . . . . . . . . . . . .   7
       2.3.2.  Write-Write . . . . . . . . . . . . . . . . . . . . .   7
       2.3.3.  Read-Write  . . . . . . . . . . . . . . . . . . . . .   8
     2.4.  Upper Layer Binding Specifications  . . . . . . . . . . .   8
     2.5.  On-The-Wire Protocol  . . . . . . . . . . . . . . . . . .   8
       2.5.1.  Inline Operation  . . . . . . . . . . . . . . . . . .   8
       2.5.2.  RDMA Segment  . . . . . . . . . . . . . . . . . . . .  11
       2.5.3.  Chunk . . . . . . . . . . . . . . . . . . . . . . . .  11
       2.5.4.  Read Chunk  . . . . . . . . . . . . . . . . . . . . .  12
       2.5.5.  Write Chunk . . . . . . . . . . . . . . . . . . . . .  12
       2.5.6.  Read List . . . . . . . . . . . . . . . . . . . . . .  13
       2.5.7.  Write List  . . . . . . . . . . . . . . . . . . . . .  14
       2.5.8.  Position Zero Read Chunk  . . . . . . . . . . . . . .  14
       2.5.9.  Reply Chunk . . . . . . . . . . . . . . . . . . . . .  15
   3.  Specification Issues  . . . . . . . . . . . . . . . . . . . .  15
     3.1.  Extensibility Considerations  . . . . . . . . . . . . . .  15
       3.1.1.  Recommendations . . . . . . . . . . . . . . . . . . .  16
     3.2.  XDR Clarifications  . . . . . . . . . . . . . . . . . . .  16
       3.2.1.  Recommendations . . . . . . . . . . . . . . . . . . .  18
     3.3.  The Position Zero Read Chunk  . . . . . . . . . . . . . .  19
       3.3.1.  Recommendations . . . . . . . . . . . . . . . . . . .  21
     3.4.  RDMA_NOMSG Call Messages  . . . . . . . . . . . . . . . .  21
       3.4.1.  Recommendations . . . . . . . . . . . . . . . . . . .  22
     3.5.  RDMA_MSG Call with Position Zero Read Chunk . . . . . . .  22
       3.5.1.  Recommendations . . . . . . . . . . . . . . . . . . .  23
     3.6.  Padding Inline Content After A Chunk  . . . . . . . . . .  23
       3.6.1.  Recommendations . . . . . . . . . . . . . . . . . . .  25
     3.7.  Write List XDR Roundup  . . . . . . . . . . . . . . . . .  25
       3.7.1.  Recommendations . . . . . . . . . . . . . . . . . . .  26
     3.8.  Write List Error Cases  . . . . . . . . . . . . . . . . .  26
       3.8.1.  Recommendations . . . . . . . . . . . . . . . . . . .  29
   4.  Operational Considerations  . . . . . . . . . . . . . . . . .  29
     4.1.  Computing Request Buffer Requirements . . . . . . . . . .  29
       4.1.1.  Recommendations . . . . . . . . . . . . . . . . . . .  30
     4.2.  Default Inline Buffer Size  . . . . . . . . . . . . . . .  30



Lever                      Expires May 5, 2016                  [Page 2]


Internet-Draft     RFC 5666 Implementation Experience      November 2015


       4.2.1.  Recommendations . . . . . . . . . . . . . . . . . . .  30
     4.3.  When To Use Reply Chunks  . . . . . . . . . . . . . . . .  30
       4.3.1.  Recommendations . . . . . . . . . . . . . . . . . . .  31
     4.4.  Computing Credit Values . . . . . . . . . . . . . . . . .  31
       4.4.1.  Recommendations . . . . . . . . . . . . . . . . . . .  32
     4.5.  Race Windows  . . . . . . . . . . . . . . . . . . . . . .  32
       4.5.1.  Recommendations . . . . . . . . . . . . . . . . . . .  32
   5.  Pre-requisites For NFSv4  . . . . . . . . . . . . . . . . . .  32
     5.1.  Bi-directional Operation  . . . . . . . . . . . . . . . .  32
       5.1.1.  Recommendations . . . . . . . . . . . . . . . . . . .  33
   6.  Considerations For Upper Layer Binding Specifications . . . .  33
     6.1.  Organization Of Binding Specification Requirements  . . .  33
       6.1.1.  Recommendations . . . . . . . . . . . . . . . . . . .  34
     6.2.  RDMA-Eligibility  . . . . . . . . . . . . . . . . . . . .  34
       6.2.1.  Recommendations . . . . . . . . . . . . . . . . . . .  35
     6.3.  Violations Of Binding Rules . . . . . . . . . . . . . . .  35
       6.3.1.  Recommendations . . . . . . . . . . . . . . . . . . .  36
     6.4.  Binding Specification Completion Assessment . . . . . . .  36
       6.4.1.  Recommendations . . . . . . . . . . . . . . . . . . .  37
   7.  Removal of Unimplemented Protocol Features  . . . . . . . . .  37
     7.1.  Read-Read Transfer Model  . . . . . . . . . . . . . . . .  37
       7.1.1.  Recommendations . . . . . . . . . . . . . . . . . . .  37
     7.2.  RDMA_MSGP . . . . . . . . . . . . . . . . . . . . . . . .  37
       7.2.1.  Recommendations . . . . . . . . . . . . . . . . . . .  38
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  38
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  38
   10. Appendix A: XDR Language Description  . . . . . . . . . . . .  38
   11. Appendix B: Binding Requirement Summary . . . . . . . . . . .  41
   12. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  43
   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  43
     13.1.  Normative References . . . . . . . . . . . . . . . . . .  43
     13.2.  Informative References . . . . . . . . . . . . . . . . .  44
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  44

1.  Introduction

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   [RFC2119].

1.2.  Purpose Of This Document

   This document summarizes implementation experience with the RPC-over-
   RDMA Version One protocol [RFC5666], and proposes improvements to the




Lever                      Expires May 5, 2016                  [Page 3]


Internet-Draft     RFC 5666 Implementation Experience      November 2015


   protocol specification based on implementer experience, frequently-
   asked questions, and interviews with a co-author of RFC 5666.

   A key contribution of this document is to highlight areas of RFC 5666
   where independent good faith readings could result in distinct
   implementations that do not interoperate with each other.  Correcting
   these specification issues is critical: fresh implementations of RPC-
   over-RDMA Version One continue to arise.

   Recommendations are limited to the following areas:

   o  Repairing specification ambiguities

   o  Codifying successful implementation practices and conventions

   o  Clarifying the role of Upper Layer Binding specifications

   o  Exploring protocol enhancements that might be added while allowing
      extant implementations to interoperate with enhanced
      implementations

1.3.  Updating RFC 5666

   During IETF 92, several alternatives for updating RFC 5666 were
   discussed with the RFC Editor and with the assembled members of the
   nfsv4 Working Group.  Among them were:

   o  Filing individual errata for each issue

   o  Introducing a new RFC that updates but does not obsolete RFC 5666,
      but makes no change to the protocol

   o  Introducing an RFC 5666bis that replaces and thus obsoletes RFC
      5666, but makes no change to the protocol

   o  Introducing a new RFC that specifies RPC-over-RDMA Version Two

   An additional possibility which is sometimes chosen by other Working
   Groups would be to update RFC 5666 as it transitions from Proposed
   Standard to Draft Standard.

   There was general agreement during the meeting regarding the need to
   update and obsolete RFC 5666 while retaining a high degree of
   interoperability with current RPC-over-RDMA Version One
   implementations.  This approach would avoid changes to on-the-wire
   behavior without burdening implementers, who could continue to
   reference a single specification of the protocol.  In addition, this




Lever                      Expires May 5, 2016                  [Page 4]


Internet-Draft     RFC 5666 Implementation Experience      November 2015


   alternative extends the life of current interoperable RPC-over-RDMA
   Version One implementations in the field.

   Subsequent discussion within the nfsv4 Working Group has focused on
   resolving specification ambiguities that make the construction of
   interoperable implementations unduly difficult.  A Version Two of
   RPC-over-RDMA, where deeper changes can be made and new functionality
   introduced, remains a possibility.

2.  RPC-Over-RDMA Essentials

   The following sections summarize the state of affairs defined in RFC
   5666.  This is a distillation of text from RFC 5666, dialog with a
   co-author of RFC 5666, and implementer experience.  The XDR
   definitions are copied from RFC 5666 Section 4.3.

2.1.  Arguments And Results

   Like a local function call, every Remote Procedure Call (RPC)
   operation has a set of one or more "arguments" and a set of one or
   more "results."  The calling context is not allowed to proceed until
   the function's results are available.  Unlike a local function call,
   the called function is executed remotely rather than in the local
   application's context.

   A client endpoint, or "requester", serializes an RPC call's arguments
   into a byte stream using XDR [RFC4506].  This "XDR stream" is
   conveyed to a server endpoint via an RPC call message (sometimes
   referred to as an "RPC request").

   The server endpoint, or "responder", deserializes the arguments and
   processes the requested operation.  It then serializes the
   operation's results into another XDR stream.  This stream is conveyed
   back to the client endpoint via an RPC reply message.  The client
   deserializes the results and allows the original caller to proceed.

   The remainder of this document assumes a working knowledge of the RPC
   protocol [RFC5531] and especially XDR [RFC4506].

2.2.  Remote Direct Memory Access

   RPC messages may be very large.  For example, NFS READ and WRITE
   operations are often 100KB or larger.

   An RPC client system can be made more efficient if RPC messages are
   transferred by a third party such as intelligent network interface
   hardware.  Remote Direct Memory Access (RDMA) and Direct Data
   Placement (DDP) enables offloading data movement to avoid the



Lever                      Expires May 5, 2016                  [Page 5]


Internet-Draft     RFC 5666 Implementation Experience      November 2015


   negative performance effects of using traditional host CPU-based
   network operations to move bulk data.

   RFC 5666 describes how to use only the Send, Receive, RDMA Read, and
   RDMA Write operations described in [RFC5040] and [RFC5041] to move
   RPC calls and replies between requesters and responders.

2.2.1.  Direct Data Placement

   RFC 5666 makes an important distinction between RDMA and Direct Data
   Placement (DDP).

   Very often, RPC implementations copy the contents of RPC messages
   into a buffer before being sent.  A good RPC implementation may be
   able to send bulk data without having to copy it into a separate send
   buffer first.

   However, socket-based RPC implementations are often unable to receive
   data directly into its final place in memory.  Receivers often need
   to copy incoming data to finish an RPC operation.

   In RFC 5666, "RDMA" refers to the physical mechanism an RDMA
   transport utilizes when moving data.  Though it may not be optimal,
   before an RDMA transfer, the sender may still copy data into place.
   After an RDMA transfer, the receiver may copy that data again to its
   final destination.

   RFC 5666 uses the term "direct data placement" to refer to an
   optimization that makes it unnecessary for a host CPU to copy data to
   be transferred.  RPC-over-RDMA Version One utilizes RDMA Read and
   Write operations to enable DDP.  Not every RDMA-based transfer in
   RPC-over-RDMA Version One is DDP, however.

2.2.2.  Channel Operation

   A Send operation initiates the transfer of a message from a local
   endpoint to a remote endpoint, similar to a datagram send operation.

   The remote endpoint pre-posts Receive operations to catch incoming
   messages.  Send operations are flow-contro