[Docs] [txt|pdf|xml|html] [Tracker] [Email] [Nits]

Versions: 00 01

Network File System Version 4                               T. Myklebust
Internet-Draft                                                    NetApp
Expires: January 7, 2010                                    July 6, 2009


 Network File System (NFS) version 4 pNFS back end protocol extensions
                 draft-myklebust-nfsv4-pnfs-backend-00

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 7, 2010.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.










Myklebust                Expires January 7, 2010                [Page 1]


Internet-Draft      pNFS back end protocol extensions          July 2009


Abstract

   This document describes an extension to the NFSv4.1 draft protocol to
   allow NFS clients to act as pNFS data servers towards other NFS
   clients.

   The intention is to reduce the load on the actual data servers by
   allowing some trusted clients to share the contents of their data
   caches with other clients.










































Myklebust                Expires January 7, 2010                [Page 2]


Internet-Draft      pNFS back end protocol extensions          July 2009


Keywords

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Description of the proposed data sharing model . . . . . . . .  5
     2.1.  NFS client acting as a pure pNFS client  . . . . . . . . .  5
     2.2.  Meta data server responsibilities  . . . . . . . . . . . .  5
     2.3.  NFS client acting as a pNFS data server  . . . . . . . . .  5
   3.  Security considerations  . . . . . . . . . . . . . . . . . . .  7
   4.  State expiration and recovery considerations . . . . . . . . .  8
   5.  Structured Data Types  . . . . . . . . . . . . . . . . . . . .  9
     5.1.  proxy_identifier4  . . . . . . . . . . . . . . . . . . . .  9
   6.  New client operations  . . . . . . . . . . . . . . . . . . . . 10
     6.1.  REGISTER_DS - Offer to act as a data server  . . . . . . . 10
       6.1.1.  ARGUMENTS  . . . . . . . . . . . . . . . . . . . . . . 10
       6.1.2.  RESULTS  . . . . . . . . . . . . . . . . . . . . . . . 10
       6.1.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 11
     6.2.  UNREGISTER_DS - Revoke offer to act as a data server . . . 11
       6.2.1.  ARGUMENTS  . . . . . . . . . . . . . . . . . . . . . . 11
       6.2.2.  RESULTS  . . . . . . . . . . . . . . . . . . . . . . . 12
       6.2.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 12
     6.3.  PROXY_OPEN - Check proxy access rights to a file . . . . . 12
       6.3.1.  ARGUMENTS  . . . . . . . . . . . . . . . . . . . . . . 12
       6.3.2.  RESULTS  . . . . . . . . . . . . . . . . . . . . . . . 12
       6.3.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 12
   7.  New callback operations  . . . . . . . . . . . . . . . . . . . 14
     7.1.  CB_PROXY_REVOKE - revoke proxy access rights to a file . . 14
       7.1.1.  ARGUMENTS  . . . . . . . . . . . . . . . . . . . . . . 14
       7.1.2.  RESULTS  . . . . . . . . . . . . . . . . . . . . . . . 14
       7.1.3.  DESCRIPTION  . . . . . . . . . . . . . . . . . . . . . 14
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16













Myklebust                Expires January 7, 2010                [Page 3]


Internet-Draft      pNFS back end protocol extensions          July 2009


1.  Introduction

   The object of this proposal is to allow further scale out of NFS
   traffic by allowing NFS clients to share the contents of their file
   data caches with other NFS clients.

   The model assumes a server workload in which a number of read-only
   files are commonly accessed by more than one client at a time.  A
   typical use case would be one in which the exported filesystem
   contains a set of libraries. e.g. a UNIX /lib partition, a set of
   CAD/CAM objects, or a collection of php modules and other static
   webserver data.  On such systems, a common problem occurs when
   booting up the cluster when possibly all clients need to access the
   same library data at roughly the same time.  The server bandwidth
   gets eaten up through serving up the same data over and over again.

   It is not obvious that use of the pNFS scale out mode is sufficient
   to avoid this kind of congestion.  The fundamental problem is that
   NFS clients are all accessing the same data, which are striped over
   the same data servers.  The effect may therefore simply be to move
   the bottleneck from the metadata server over onto the data servers.

   Current methods of reducing the impact of this congestion typically
   require the user to dedicate extra resources for the boot process.
   They include preloading the data on the client in local permanent
   caches (a.k.a. cachefs), replication of the shared data across
   several NFS servers and setting up NFS proxy servers.

   Another solution, which does not require the use of dedicated
   resources is the peer-to-peer model in which the first few clients to
   read the data from the server are allowed to share the contents of
   their cache with the next waves.  This RFC attempts to enable such a
   model by allowing clients which have already cached data to act as
   pNFS data servers toward their peers.  It does so by defining a
   control protocol in the sense defined in Section 12.2.6 of
   [draft-ietf-nfsv4-minorversion1-29] to enable the data server to
   enforce layouts, and negotiate authentication and authorization
   information with the server.













Myklebust                Expires January 7, 2010                [Page 4]


Internet-Draft      pNFS back end protocol extensions          July 2009


2.  Description of the proposed data sharing model

2.1.  NFS client acting as a pure pNFS client

   The proposal implies no protocol changes for NFSv4.1 clients that
   wish only to act as pNFS clients, in order to access the cached data
   from other clients.  These clients will request file layouts from the
   meta data server using the LAYOUTGET operation in the usual fashion.
   Should the server return NFS4ERR_LAYOUTUNAVAILABLE, then the client
   proceeds to read from the file through the metadata server in the
   usual manner.  Otherwise, the client interprets the returned file
   layout in the manner specified by Section 13 of
   [draft-ietf-nfsv4-minorversion1-29] (NFSv4.1 as a Storage Protocol in
   pNFS: the File Layout Type).

2.2.  Meta data server responsibilities

   The metadata server has the usual responsibilities as dictated by
   Section 13 of [draft-ietf-nfsv4-minorversion1-29].  It maintains the
   list of available data servers for each file, and manages the layout
   requests from pNFS clients, responds to PROXY_OPEN requests from data
   servers, and ensures that PROXY_OPEN stateids are revoked when the
   corresponding layout is revoked.

2.3.  NFS client acting as a pNFS data server

   A client that wishes to act as a data server is required to notify
   the metadata server of that intention using the REGISTER_DS
   operation.  Depending on the circumstances, the client may opt to
   register as a data server for all cached files, for just a single
   filesystem, for a collection of filesystems, for a collection of
   specific files, or for just a single file.

   As stated in the introduction, the design assumes that the sharing of
   cached data between NFS clients will reduce the amount of NFS traffic
   to the permanent storage medium.  It therefore only makes sense to
   invoke this model in the case when the server knows that the client
   that is acting as a data server is caching the file data
   aggressively.  In order to verify that this is the case, we require
   that the metadata server can only issue layouts for data servers that
   hold a read delegation for the file in question.

   Conversely, a client that is registered to act as a data server, and
   that receives a READ request for a file for which it does not hold a
   delegation, MUST reject that request with the error
   NFS4ERR_PNFS_NO_LAYOUT.

   When the data server receives a READ request from a client with a



Myklebust                Expires January 7, 2010                [Page 5]


Internet-Draft      pNFS back end protocol extensions          July 2009


   stateid or a data server filehandle that it does not recognise, it
   attempts to validate that request using the PROXY_OPEN call.  This
   operation will convert the data server filehandle as provided by the
   layout into a real filehandle, that the data server can use to access
   the file on the metadata server.  In order to make it easy for the
   data server to identify the file, the real filehandle SHOULD match
   the filehandle that was returned to the client when it received the
   read delegation.

   The PROXY_OPEN call also checks the access rights that were granted
   by the layout and the READ stateid for validity.  If the pNFS client
   in question does not hold a layout for this file, the PROXY_OPEN
   request from the data server will return NFS4ERR_PNFS_NO_LAYOUT.  In
   this case, the data server should not attempt to service the READ
   request, but should pass the error on to the pNFS client.

   If file access was verified by PROXY_OPEN, the data server can then
   attempt to service the READ request from its cache.  Should it fail
   to find the data in its cache, the data server should attempt to
   retrieve it from the parent server.

   When layouts are returned to the metadata server, the data server is
   made responsible for fencing off any further READ requests.  To do
   so, the metadata server sends a CB_PROXY_REVOKE callback to the data
   server (or servers) that are referenced by that layout.  Upon
   receiving the CB_PROXY_REVOKE callback, the data server should match
   the filehandle and stateid arguments to the data filehandle that was
   previously used as an argument to the PROXY_OPEN request, and the
   stateid that was returned by that request.  Should the client attempt
   to reuse the same data filehandle and stateid in a future READ
   request, then the data server SHOULD revalidate the client's access
   using another PROXY_OPEN rpc call to the metadata server.

   An NFS client can at all times revoke its offer to act as a data
   server by using the UNREGISTER_DS operation.  This operation takes a
   single stateid, as returned by the original REGISTER_DS request.
   When the metadata server receives such a request, it must immediately
   revoke all layouts that reference that particular data server.  It
   does not need to send a CB_PROXY_REVOKE notification to the data
   server that it unregistering, however it MUST any other data servers
   that are referenced by the same layout.










Myklebust                Expires January 7, 2010                [Page 6]


Internet-Draft      pNFS back end protocol extensions          July 2009


3.  Security considerations

   As per Section 13.1 in [draft-ietf-nfsv4-minorversion1-29], it is
   expected that metadata servers will need to encode server routing
   information in the data server filehandles.  To enable this, the
   REGISTER_DS request includes a 64-bit cookie argument that the
   metadata server is required to store.  It is then required to encode
   that 64-bit cookie in the first 64-bits of the data server
   filehandle.

   All operations from the data server to the metadata server, including
   any operations required to refill the file cache in order to satisfy
   a READ request by the pNFS client should be authenticated using a
   principal of the form "nfsd/hostname@REALM".  It is, however expected
   that this requirement will be obsoleted, should the proposal for
   RPCSEC_GSSv3 [draft-williams-rpcsecgssv3] be approved.  In this case,
   the data server may instead choose to create a process credential
   that asserts the credentials of the pNFS client.

































Myklebust                Expires January 7, 2010                [Page 7]


Internet-Draft      pNFS back end protocol extensions          July 2009


4.  State expiration and recovery considerations

   Should the pNFS client's session expire on the metadata server, then
   the latter is required to recall all layouts from the data servers
   using the CB_PROXY_REVOKE callback.  Upon re-establishing the
   session, the pNFS client then proceeds to follow the usual state
   recovery routine, including layout recovery.

   Should the pNFS client's session expire on the data server then it is
   required to recover that session before it can issue a new READ
   request.  In that case, the data server MUST assume that all existing
   layouts have been revoked.  Should the pNFS client attempt to assert
   a layout then it MUST be validated using a PROXY_OPEN call.

   Should the data server's session expire on the metadata server, then
   the metadata server MUST revoke all layouts that reference that data
   server.  It should also consider as invalid any REGISTER_DS requests
   that the data server had issued.  After recovering its session, the
   data server MAY reissue the REGISTER_DS requests.

   Finally, if the metadata server crashes, then the data server SHOULD
   assert all REGISTER_DS requests as part of the recovery process.
   Once that is done, it must also assume that all layouts have been
   revoked, and that any attempt to reuse them MUST be revalidated using
   a PROXY_OPEN request.  Otherwise, both it and the pNFS client perform
   the normal NFS client recovery process.

























Myklebust                Expires January 7, 2010                [Page 8]


Internet-Draft      pNFS back end protocol extensions          July 2009


5.  Structured Data Types

5.1.  proxy_identifier4

     union proxy_identifier4 switch (uint32_t flavor) {
     case RPCSEC_GSS:
           principal_arg           pid_principal;
     case AUTH_SYS:
           struct authsys_parms    pid_authsys;
     default:
           void;
     };

   The proxy_identifier4 data type is used to identify the user on
   behalf of which the data server is issuing a PROXY_OPEN.




































Myklebust                Expires January 7, 2010                [Page 9]


Internet-Draft      pNFS back end protocol extensions          July 2009


6.  New client operations

6.1.  REGISTER_DS - Offer to act as a data server

6.1.1.  ARGUMENTS

     const NFS4_MDS_IDENTIFIER_SIZE       = 8;

     enum register_ds_type4 {
           REGISTER_DS_ALL                 = 0,
           REGISTER_DS_FILESYSTEM          = 1,
           REGISTER_DS_ADD_FILESYSTEM      = 2
           REGISTER_DS_FILE                = 3
           REGISTER_DS_ADD_FILE            = 4
     };

     typedef opaque mds_identifier4[NFS4_MDS_IDENTIFIER_SIZE];

     union register_ds (register_ds_type4 ds_type) {
     case REGISTER_DS_ALL:
           mds_identifier4 rea_mds_identifier;
     case REGISTER_DS_FILESYSTEM:
           /* CURRENT_FH: file on filesystem being re-exported */
           mds_identifier4 rea_mds_identifier;
     case REGISTER_DS_ADD_FILESYSTEM:
           /* CURRENT_FH: file on filesystem being re-exported */
           stateid4        rea_dataserver_stateid;
     case REGISTER_DS_FILE:
           /* CURRENT_FH: file being re-exported */
           mds_identifier4 rea_mds_identifier;
     case REGISTER_DS_ADD_FILE:
           /* CURRENT_FH: file being re-exported */
           stateid4        rea_dataserver_stateid;
     };

     struct REGISTER_DS4args {
           register_ds     rea_dsinfo;
     };

6.1.2.  RESULTS

     union REGISTER_DS4res (nfsstat4 status) {
     case NFS4_OK:
           stateid4        res_dataserver_stateid;
     default:
           void;
     };




Myklebust                Expires January 7, 2010               [Page 10]


Internet-Draft      pNFS back end protocol extensions          July 2009


6.1.3.  DESCRIPTION

   The REGISTER_DS operation allows an NFS client to signal to the
   metadata server its ability to act as a data server towards other NFS
   clients.  Note that the MDS MUST not ever issue a layout for a file
   for which the client does not hold a read delegation.

   The client can register an intention to export all files for which it
   holds a read delegation, using the argument REGISTER_DS_ALL.

   It is also anticipated that some NFS setups may have the ability to
   set a caching and/or re-exporting policy.  For such setups, it is
   possible to set more fine-grained data server policies:
      REGISTER_DS_FILESYSTEM allows the client to specify that it wants
      to be a data server for a specific filesystem only.
      REGISTER_DS_ADD_FILESYSTEM allows the client to specify that it
      wants to add a filesystem to the data server represented by the
      stateid 'rea_dataserver_stateid'.
      REGISTER_DS_FILE allows the client to specify that it wants to
      serve a particular file only.
      REGISTER_DS_ADD_FILE allows the client to specify that it wants to
      add a file to the data server represented by the stateid
      'rea_dataserver_stateid'.

   The client should also supply a unique 64-bit identifier in the
   argument rea_mds_identifier.  This identifier should be put as the
   first 8 bytes of any data server filehandle, and may be used by the
   data server to identify the MDS to which the filehandle belongs.

   On success, the server returns the stateid 'res_dataserver_stateid'
   which acts to identify the data server in future REGISTER_DS calls,
   and in UNREGISTER_DS calls.

   The client may in fact hold several data server stateids, and use
   them to manage the overall policy.

6.2.  UNREGISTER_DS - Revoke offer to act as a data server

6.2.1.  ARGUMENTS

     struct UNREGISTER_DS4args {
           stateid4        una_dataserver_stateid;
     };








Myklebust                Expires January 7, 2010               [Page 11]


Internet-Draft      pNFS back end protocol extensions          July 2009


6.2.2.  RESULTS

     struct UNREGISTER_DS4res {
           nfsstat4        status;
     };

6.2.3.  DESCRIPTION

   When the MDS receives an UNREGISTER_DS operation then it must
   immediately invalidate all state associated with the data server
   stateid 'una_dataserver_stateid'.

   It MUST therefore revoke all layouts that refer to the data server
   that is represented by una_dataserver_stateid.

   After revoking the layouts, the MDS MUST no longer issue layouts for
   these files and filesystems using the data server represented by
   una_dataserver_stateid.

6.3.  PROXY_OPEN - Check proxy access rights to a file

   This takes a data server filehandle, the read stateid, and a proxy
   user identifier, and returns the true filehandle on success.

6.3.1.  ARGUMENTS

     struct PROXY_OPEN4args {
           /* CURRENT_FH: "data server filehandle" */
           proxy_identifier4       popa_user_id;
           stateid4                popa_read_stateid;
     };

6.3.2.  RESULTS

     union PROXY_OPEN4res switch (nfsstat4 status) {
           case NFS4_OK:
                   /* CURRENTFH: true filehandle */
                   stateid4        popr_proxy_stateid;
           default:
                   void;
     };

6.3.3.  DESCRIPTION

   The PROXY_OPEN function authenticates the READ request by the pNFS
   client.  If the data filehandle is valid, and the user identified by
   the popa_user_id is authorised to access the file, then the metadata
   server returns the true filehandle (as returned by LOOKUP and/or



Myklebust                Expires January 7, 2010               [Page 12]


Internet-Draft      pNFS back end protocol extensions          July 2009


   OPEN) of the file.

   If the pNFS client does not currently hold a layout for this file,
   then the PROXY_OPEN request should fail with the error
   NFS4ERR_PNFS_NO_LAYOUT.

   If the data server filehandle argument cannot be translated into a
   valid metadata server filehandle, then the errors NFS4ERR_STALE,
   NFS4ERR_BADHANDLE, or NFS4ERR_FHEXPIRED should be returned, as
   appropriate.

   If the stateid argument does not correspond to a valid open stateid,
   delegation stateid, or lock stateid, for the file that is being
   attempted READ, then the metadata server should return the
   appropriate error.

   In case of success, the metadata server returns a stateid
   "popr_proxy_stateid" that is used by the CB_PROXY_REVOKE callback to
   identify which layout is being revoked.
































Myklebust                Expires January 7, 2010               [Page 13]


Internet-Draft      pNFS back end protocol extensions          July 2009


7.  New callback operations

7.1.  CB_PROXY_REVOKE - revoke proxy access rights to a file

   This takes a data server filehandle, and a proxy open stateid, and
   revokes them.

7.1.1.  ARGUMENTS

     struct CB_PROXY_REVOKE4args {
           nfs_fh4         pra_object;
           stateid4        pra_proxy_stateid;
     };

7.1.2.  RESULTS

     struct CB_PROXY_REVOKE4res{
           nfsstat4        prr_status;
     };

7.1.3.  DESCRIPTION

   pra_object is the data server filehandle for the file, whereas
   pra_proxy_stateid is the stateid that was returned by the PROXY_OPEN
   operation.

   Upon receiving this callback, the data server MUST invalidate all
   state associated with the stateid pra_proxy_stateid, and return
   NFS4_OK.

   If the filehandle was not found, the client MUST return
   NFS4ERR_BADHANDLE.  If the stateid was not found, it MUST return
   NFS4ERR_BAD_STATEID.


















Myklebust                Expires January 7, 2010               [Page 14]


Internet-Draft      pNFS back end protocol extensions          July 2009


8.  References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", RFC 2119.

   [draft-ietf-nfsv4-minorversion1-29]
              Shepler, S., Eisler, M., and D. Noveck, "NFS Version 4
              Minor Version 1", draft-ietf-nfsv4-minorversion1 29.

   [draft-williams-rpcsecgssv3]
              Williams, N., "Remote Procedure Call (RPC) Security
              Version 3", draft-williams-rpcsecgssv3 00.







































Myklebust                Expires January 7, 2010               [Page 15]


Internet-Draft      pNFS back end protocol extensions          July 2009


Author's Address

   Trond Myklebust
   NetApp
   3215 Bellflower Ct
   Ann Arbor, MI  48103
   USA

   Phone: +1-734-662-6608
   Email: Trond.Myklebust@netapp.com









































Myklebust                Expires January 7, 2010               [Page 16]


Html markup produced by rfcmarkup 1.129d, available from https://tools.ietf.org/tools/rfcmarkup/