--- 1/draft-ietf-nfsv4-flex-files-08.txt 2017-05-09 17:13:09.111846873 -0700 +++ 2/draft-ietf-nfsv4-flex-files-09.txt 2017-05-09 17:13:09.183848601 -0700 @@ -1,51 +1,51 @@ NFSv4 B. Halevy Internet-Draft Intended status: Standards Track T. Haynes -Expires: November 10, 2016 Primary Data - May 09, 2016 +Expires: November 10, 2017 Primary Data + May 09, 2017 Parallel NFS (pNFS) Flexible File Layout - draft-ietf-nfsv4-flex-files-08.txt + draft-ietf-nfsv4-flex-files-09.txt Abstract The Parallel Network File System (pNFS) allows a separation between the metadata (onto a metadata server) and data (onto a storage device) for a file. The flexible file layout type is defined in this - document as an extension to pNFS to allow the use of storage devices - in a fashion such that they require only a quite limited degree of - interaction with the metadata server, using already existing - protocols. Client side mirroring is also added to provide + document as an extension to pNFS which allows the use of storage + devices in a fashion such that they require only a quite limited + degree of interaction with the metadata server, using already + existing protocols. Client side mirroring is also added to provide replication of files. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on November 10, 2016. + This Internet-Draft will expire on November 10, 2017. Copyright Notice - Copyright (c) 2016 IETF Trust and the persons identified as the + Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as @@ -79,133 +79,131 @@ 8. Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . 20 8.1. Selecting a Mirror . . . . . . . . . . . . . . . . . . . 21 8.2. Writing to Mirrors . . . . . . . . . . . . . . . . . . . 21 8.3. Metadata Server Resilvering of the File . . . . . . . . . 22 9. Flexible Files Layout Type Return . . . . . . . . . . . . . . 22 9.1. I/O Error Reporting . . . . . . . . . . . . . . . . . . . 23 9.1.1. ff_ioerr4 . . . . . . . . . . . . . . . . . . . . . . 23 9.2. Layout Usage Statistics . . . . . . . . . . . . . . . . . 24 9.2.1. ff_io_latency4 . . . . . . . . . . . . . . . . . . . 24 9.2.2. ff_layoutupdate4 . . . . . . . . . . . . . . . . . . 25 - 9.2.3. ff_iostats4 . . . . . . . . . . . . . . . . . . . . . 25 - 9.3. ff_layoutreturn4 . . . . . . . . . . . . . . . . . . . . 26 + 9.2.3. ff_iostats4 . . . . . . . . . . . . . . . . . . . . . 26 + 9.3. ff_layoutreturn4 . . . . . . . . . . . . . . . . . . . . 27 10. Flexible Files Layout Type LAYOUTERROR . . . . . . . . . . . 27 11. Flexible Files Layout Type LAYOUTSTATS . . . . . . . . . . . 27 - 12. Flexible File Layout Type Creation Hint . . . . . . . . . . . 27 + 12. Flexible File Layout Type Creation Hint . . . . . . . . . . . 28 12.1. ff_layouthint4 . . . . . . . . . . . . . . . . . . . . . 28 - 13. Recalling a Layout . . . . . . . . . . . . . . . . . . . . . 28 - 13.1. CB_RECALL_ANY . . . . . . . . . . . . . . . . . . . . . 28 + 13. Recalling a Layout . . . . . . . . . . . . . . . . . . . . . 29 + 13.1. CB_RECALL_ANY . . . . . . . . . . . . . . . . . . . . . 29 - 14. Client Fencing . . . . . . . . . . . . . . . . . . . . . . . 29 + 14. Client Fencing . . . . . . . . . . . . . . . . . . . . . . . 30 15. Security Considerations . . . . . . . . . . . . . . . . . . . 30 - 15.1. Kerberized File Access . . . . . . . . . . . . . . . . . 30 + 15.1. Kerberized File Access . . . . . . . . . . . . . . . . . 31 15.1.1. Loosely Coupled . . . . . . . . . . . . . . . . . . 31 15.1.2. Tightly Coupled . . . . . . . . . . . . . . . . . . 31 - 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 - 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 31 - 17.1. Normative References . . . . . . . . . . . . . . . . . . 31 - 17.2. Informative References . . . . . . . . . . . . . . . . . 32 - Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 32 + 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 + 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 32 + 17.1. Normative References . . . . . . . . . . . . . . . . . . 32 + 17.2. Informative References . . . . . . . . . . . . . . . . . 33 + Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 33 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 33 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 1. Introduction In the parallel Network File System (pNFS), the metadata server returns layout type structures that describe where file data is located. There are different layout types for different storage systems and methods of arranging data on storage devices. This document defines the flexible file layout type used with file-based data servers that are accessed using the Network File System (NFS) protocols: NFSv3 [RFC1813], NFSv4.0 [RFC7530], NFSv4.1 [RFC5661], and NFSv4.2 [RFC7862]. To provide a global state model equivalent to that of the files layout type, a back-end control protocol MAY be implemented between the metadata server and NFSv4.1+ storage devices. It is out of scope - for this document to specify the wire protocol of such a protocol, - yet the requirements for the protocol are specified in [RFC5661] and - clarified in [pNFSLayouts]. + for this document to specify such a protocol, yet the requirements + for the protocol are specified in [RFC5661] and clarified in + [pNFSLayouts]. 1.1. Definitions control protocol: is a set of requirements for the communication of information on layouts, stateids, file metadata, and file data between the metadata server and the storage devices (see [pNFSLayouts]). client-side mirroring: is when the client and not the server is responsible for updating all of the mirrored copies of a layout segment. - data file: is that part of the file system object which describes - the payload and not the object. E.g., it is the file contents. + data file: is that part of the file system object which contains the + content. data server (DS): is one of the pNFS servers which provides the contents of a file system object which is a regular file. Depending on the layout, there might be one or more data servers over which the data is striped. Note that while the metadata server is strictly accessed over the NFSv4.1+ protocol, depending on the layout type, the data server could be accessed via any protocol that meets the pNFS requirements. fencing: is when the metadata server prevents the storage devices from processing I/O from a specific client to a specific file. file layout type: is a layout type in which the storage devices are - accessed via the NFS protocol. + accessed via the NFS protocol (see Section 13 of [RFC5661]). layout: informs a client of which storage devices it needs to communicate with (and over which protocol) to perform I/O on a file. The layout might also provide some hints about how the storage is physically organized. layout iomode: describes whether the layout granted to the client is for read or read/write I/O. layout segment: describes a sub-division of a layout. That sub- division might be by the iomode (see Sections 3.3.20 and 12.2.9 of [RFC5661]), a striping pattern (see Section 13.3 of [RFC5661]), or requested byte range. layout stateid: is a 128-bit quantity returned by a server that uniquely defines the layout state provided by the server for a specific layout that describes a layout type and file (see - Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 describes - the difference between a layout stateid and a normal stateid. + Section 12.5.2 of [RFC5661]). Further, Section 12.5.3 of + [RFC5661] describes the difference between a layout stateid and a + normal stateid. layout type: describes both the storage protocol used to access the data and the aggregation scheme used to lay out the file data on the underlying storage devices. loose coupling: is when the metadata server and the storage devices do not have a control protocol present. metadata file: is that part of the file system object which - describes the object and not the payload. E.g., it could be the + describes the object and not the content. E.g., it could be the time since last modification, access, etc. metadata server (MDS): is the pNFS server which provides metadata information for a file system object. It also is responsible for generating layouts for file system objects. Note that the MDS is responsible for directory-based operations. - mirror: is a copy of a layout segment. While mirroring can be used - for backing up a layout segment, the copies can be distributed - such that each remote site has a locally available copy. Note - that if one copy of the mirror is updated, then all copies must be - updated. + mirror: is a copy of a layout segment. Note that if one copy of the + mirror is updated, then all copies must be updated. recalling a layout: is when the metadata server uses a back channel to inform the client that the layout is to be returned in a - graceful manner. Note that the client could be able to flush any - writes, etc., before replying to the metadata server. + graceful manner. Note that the client has the opportunity to + flush any writes, etc., before replying to the metadata server. revoking a layout: is when the metadata server invalidates the layout such that neither the metadata server nor any storage device will accept any access from the client with that layout. resilvering: is the act of rebuilding a mirrored copy of a layout segment from a known good copy of the layout segment. Note that this can also be done to create a new mirrored copy of the layout segment. @@ -683,22 +680,22 @@ /// }; /// The ff_layout4 structure specifies a layout over a set of mirrored copies of that portion of the data file described in the current layout segment. This mirroring protects against loss of data in layout segments. Note that while not explicitly shown in the above XDR, each layout4 element returned in the logr_layout array of - LAYOUTGET4res (see Section 18.43.1 of [RFC5661]) descibes a layout - segment. Hence each ff_layout4 also descibes a layout segment. + LAYOUTGET4res (see Section 18.43.1 of [RFC5661]) describes a layout + segment. Hence each ff_layout4 also describes a layout segment. It is possible that the file is concatenated from more than one layout segment. Each layout segment MAY represent different striping parameters, applying respectively only to the layout segment byte range. The ffl_stripe_unit field is the stripe unit size in use for the current layout segment. The number of stripes is given inside each mirror by the number of elements in ffm_data_servers. If the number of stripes is one, then the value for ffl_stripe_unit MUST default to @@ -978,39 +975,57 @@ device because it has no presence on the given subnet. As such, it is the client which decides which mirror to access for reading the file. The requirements for writing to a mirrored layout segments are presented below. 8.2. Writing to Mirrors The client is responsible for updating all mirrored copies of the layout segments that it is given in the layout. A single failed - update is suffcient to fail the entire operation. I.e., if all but + update is sufficient to fail the entire operation. I.e., if all but one copy is updated successfully and the last one provides an error, - then the client needs to return the layout to the metadata server - with an error indicating that the update failed to that storage - device. If the client is updating the mirrors serially, then it - SHOULD stop at the first error encountered and report that to the + then the client needs to inform the metadata server about the error + via either LAYOUTRETURN or LAYOUTERROR that the update failed to that + storage device. If the client is updating the mirrors serially, then + it SHOULD stop at the first error encountered and report that to the metadata server. If the client is updating the mirrors in parallel, then it SHOULD wait until all storage devices respond such that it can report all errors encountered during the update. The metadata server is then responsible for determining if it wants to remove the errant mirror from the layout, if the mirror has recovered from some transient error, etc. When the client tries to get a new layout, the metadata server informs it of the decision by the contents of the layout. The client MUST NOT make any assumptions that the contents of the previous layout will match those of the new one. If it has updates that were not committed, it MUST resend those updates to all mirrors. + There is no provision in the protocol for the metadata server to + directly determine that the client has or has not recovered from an + error. I.e., assume that the storage device was network partitioned + from the client and all of the copies are successfully updated after + the error was reported. There is no mechanism for the client to + report that fact and the metadata server is forced to repair the file + across the mirror. + + If the client supports NFSv4.2, it can use LAYOUTERROR and + LAYOUTRETURN to provide hints to the metadata server about the + recovery efforts. A LAYOUTERROR on a file is for a non-fatal error. + A subsequent LAYOUTRETURN without a ff_ioerr4 indicates that the + client successfully replayed the I/O to all mirrors. Any + LAYOUTRETURN with a ff_ioerr4 is an error that the metadata server + needs to repair. The client MUST be prepared for the LAYOUTERROR to + trigger a CB_LAYOUTRECALL if the metadata server determines it needs + to start repairing the file. + 8.3. Metadata Server Resilvering of the File The metadata server may elect to create a new mirror of the layout segments at any time. This might be to resilver a copy on a storage device which was down for servicing, to provide a copy of the layout segments on storage with different storage performance characteristics, etc. As the client will not be aware of the new mirror and the metadata server will not be aware of updates that the client is making to the layout segments, the metadata server MUST recall the writable layout segment(s) that it is resilvering. If the @@ -1477,22 +1492,22 @@ Protocol", RFC 5661, January 2010. [RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., "Network File System (NFS) Version 4 Minor Version 1 External Data Representation Standard (XDR) Description", RFC 5662, January 2010. [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) version 4 Protocol", RFC 7530, March 2015. - [RFC7862] Haynes, T., "NFS Version 4 Minor Version 2", RFC 7862, May - 2016. + [RFC7862] Haynes, T., "NFS Version 4 Minor Version 2", RFC 7862, + November 2016. [pNFSLayouts] Haynes, T., "Requirements for pNFS Layout Types", draft- ietf-nfsv4-layout-types-04 (Work In Progress), January 2016. 17.2. Informative References [RFC4519] Sciberras, A., Ed., "Lightweight Directory Access Protocol (LDAP): Schema for User Applications", RFC 4519, DOI