draft-ietf-nfsv4-minorversion2-39.txt   draft-ietf-nfsv4-minorversion2-40.txt 
NFSv4 T. Haynes NFSv4 T. Haynes
Internet-Draft Primary Data Internet-Draft Primary Data
Intended status: Standards Track September 01, 2015 Intended status: Standards Track January 06, 2016
Expires: March 4, 2016 Expires: July 9, 2016
NFS Version 4 Minor Version 2 NFS Version 4 Minor Version 2
draft-ietf-nfsv4-minorversion2-39.txt draft-ietf-nfsv4-minorversion2-40.txt
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version two, This Internet-Draft describes NFS version 4 minor version two,
describing the protocol extensions made from NFS version 4 minor describing the protocol extensions made from NFS version 4 minor
version 1. Major extensions introduced in NFS version 4 minor version 1. Major extensions introduced in NFS version 4 minor
version two include: Server Side Copy, Application I/O Advise, Space version two include: Server Side Copy, Application Input/Output (I/O)
Reservations, Sparse Files, Application Data Blocks, and Labeled NFS. Advise, Space Reservations, Sparse Files, Application Data Blocks,
and Labeled NFS.
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
skipping to change at page 1, line 40 skipping to change at page 1, line 41
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 4, 2016. This Internet-Draft will expire on July 9, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. The NFS Version 4 Minor Version 2 Protocol . . . . . . . 4 1.1. Scope of This Document . . . . . . . . . . . . . . . . . 5
1.2. Scope of This Document . . . . . . . . . . . . . . . . . 5 1.2. NFSv4.2 Goals . . . . . . . . . . . . . . . . . . . . . . 5
1.3. NFSv4.2 Goals . . . . . . . . . . . . . . . . . . . . . . 5 1.3. Overview of NFSv4.2 Features . . . . . . . . . . . . . . 5
1.4. Overview of NFSv4.2 Features . . . . . . . . . . . . . . 5 1.3.1. Server Side Clone and Copy . . . . . . . . . . . . . 5
1.4.1. Server Side Copy . . . . . . . . . . . . . . . . . . 5 1.3.2. Application Input/Output (I/O) Advise . . . . . . . . 6
1.4.2. Application I/O Advise . . . . . . . . . . . . . . . 6 1.3.3. Sparse Files . . . . . . . . . . . . . . . . . . . . 6
1.4.3. Sparse Files . . . . . . . . . . . . . . . . . . . . 6 1.3.4. Space Reservation . . . . . . . . . . . . . . . . . . 6
1.4.4. Space Reservation . . . . . . . . . . . . . . . . . . 6 1.3.5. Application Data Block (ADB) Support . . . . . . . . 6
1.4.5. Application Data Block (ADB) Support . . . . . . . . 6 1.3.6. Labeled NFS . . . . . . . . . . . . . . . . . . . . . 7
1.4.6. Labeled NFS . . . . . . . . . . . . . . . . . . . . . 6 1.3.7. Layout Enhancements . . . . . . . . . . . . . . . . . 7
1.5. Enhancements to Minor Versioning Model . . . . . . . . . 7 1.4. Enhancements to Minor Versioning Model . . . . . . . . . 7
2. Minor Versioning . . . . . . . . . . . . . . . . . . . . . . 7 2. Minor Versioning . . . . . . . . . . . . . . . . . . . . . . 7
3. pNFS considerations for New Operations . . . . . . . . . . . 8 3. pNFS considerations for New Operations . . . . . . . . . . . 8
3.1. Atomicity for ALLOCATE and DEALLOCATE . . . . . . . . . . 8 3.1. Atomicity for ALLOCATE and DEALLOCATE . . . . . . . . . . 8
3.2. Sharing of stateids with NFSv4.1 . . . . . . . . . . . . 8 3.2. Sharing of stateids with NFSv4.1 . . . . . . . . . . . . 8
3.3. NFSv4.2 as a Storage Protocol in pNFS: the File Layout 3.3. NFSv4.2 as a Storage Protocol in pNFS: the File Layout
Type . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Type . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1. Operations Sent to NFSv4.2 Data Servers . . . . . . . 8 3.3.1. Operations Sent to NFSv4.2 Data Servers . . . . . . . 9
4. Server Side Copy . . . . . . . . . . . . . . . . . . . . . . 9 4. Server Side Copy . . . . . . . . . . . . . . . . . . . . . . 9
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 9 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 9
4.2. Protocol Overview . . . . . . . . . . . . . . . . . . . . 9 4.2. Protocol Overview . . . . . . . . . . . . . . . . . . . . 9
4.2.1. Copy Operations . . . . . . . . . . . . . . . . . . . 10 4.2.1. Copy Operations . . . . . . . . . . . . . . . . . . . 10
4.2.2. Requirements for Operations . . . . . . . . . . . . . 10 4.2.2. Requirements for Operations . . . . . . . . . . . . . 11
4.3. Requirements for Inter-Server Copy . . . . . . . . . . . 11 4.3. Requirements for Inter-Server Copy . . . . . . . . . . . 12
4.4. Implementation Considerations . . . . . . . . . . . . . . 11 4.4. Implementation Considerations . . . . . . . . . . . . . . 12
4.4.1. Locking the Files . . . . . . . . . . . . . . . . . . 11 4.4.1. Locking the Files . . . . . . . . . . . . . . . . . . 12
4.4.2. Client Caches . . . . . . . . . . . . . . . . . . . . 12 4.4.2. Client Caches . . . . . . . . . . . . . . . . . . . . 13
4.5. Intra-Server Copy . . . . . . . . . . . . . . . . . . . . 12 4.5. Intra-Server Copy . . . . . . . . . . . . . . . . . . . . 13
4.6. Inter-Server Copy . . . . . . . . . . . . . . . . . . . . 13 4.6. Inter-Server Copy . . . . . . . . . . . . . . . . . . . . 14
4.7. Server-to-Server Copy Protocol . . . . . . . . . . . . . 17 4.7. Server-to-Server Copy Protocol . . . . . . . . . . . . . 18
4.7.1. Considerations on Selecting a Copy Protocol . . . . . 17 4.7.1. Considerations on Selecting a Copy Protocol . . . . . 18
4.7.2. Using NFSv4.x as the Copy Protocol . . . . . . . . . 17 4.7.2. Using NFSv4.x as the Copy Protocol . . . . . . . . . 18
4.7.3. Using an Alternative Copy Protocol . . . . . . . . . 17 4.7.3. Using an Alternative Copy Protocol . . . . . . . . . 18
4.8. netloc4 - Network Locations . . . . . . . . . . . . . . . 18 4.8. netloc4 - Network Locations . . . . . . . . . . . . . . . 19
4.9. Copy Offload Stateids . . . . . . . . . . . . . . . . . . 19 4.9. Copy Offload Stateids . . . . . . . . . . . . . . . . . . 20
4.10. Security Considerations . . . . . . . . . . . . . . . . . 19 4.10. Security Considerations . . . . . . . . . . . . . . . . . 20
4.10.1. Inter-Server Copy Security . . . . . . . . . . . . . 20 4.10.1. Inter-Server Copy Security . . . . . . . . . . . . . 21
5. Support for Application IO Hints . . . . . . . . . . . . . . 27
6. Sparse Files . . . . . . . . . . . . . . . . . . . . . . . . 27 5. Support for Application I/O Hints . . . . . . . . . . . . . . 28
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 27 6. Sparse Files . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 28 6.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 29
6.3. New Operations . . . . . . . . . . . . . . . . . . . . . 28 6.2. New Operations . . . . . . . . . . . . . . . . . . . . . 30
6.3.1. READ_PLUS . . . . . . . . . . . . . . . . . . . . . . 28 6.2.1. READ_PLUS . . . . . . . . . . . . . . . . . . . . . . 30
6.3.2. DEALLOCATE . . . . . . . . . . . . . . . . . . . . . 29 6.2.2. DEALLOCATE . . . . . . . . . . . . . . . . . . . . . 30
7. Space Reservation . . . . . . . . . . . . . . . . . . . . . . 29 7. Space Reservation . . . . . . . . . . . . . . . . . . . . . . 30
8. Application Data Block Support . . . . . . . . . . . . . . . 31 8. Application Data Block Support . . . . . . . . . . . . . . . 32
8.1. Generic Framework . . . . . . . . . . . . . . . . . . . . 32 8.1. Generic Framework . . . . . . . . . . . . . . . . . . . . 33
8.1.1. Data Block Representation . . . . . . . . . . . . . . 32 8.1.1. Data Block Representation . . . . . . . . . . . . . . 34
8.2. An Example of Detecting Corruption . . . . . . . . . . . 33 8.2. An Example of Detecting Corruption . . . . . . . . . . . 34
8.3. Example of READ_PLUS . . . . . . . . . . . . . . . . . . 34 8.3. Example of READ_PLUS . . . . . . . . . . . . . . . . . . 36
8.4. An Example of Zeroing Space . . . . . . . . . . . . . . . 35 8.4. An Example of Zeroing Space . . . . . . . . . . . . . . . 36
9. Labeled NFS . . . . . . . . . . . . . . . . . . . . . . . . . 35 9. Labeled NFS . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 35 9.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 37
9.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . 36 9.2. MAC Security Attribute . . . . . . . . . . . . . . . . . 38
9.3. MAC Security Attribute . . . . . . . . . . . . . . . . . 37 9.2.1. Delegations . . . . . . . . . . . . . . . . . . . . . 39
9.3.1. Delegations . . . . . . . . . . . . . . . . . . . . . 38 9.2.2. Permission Checking . . . . . . . . . . . . . . . . . 39
9.3.2. Permission Checking . . . . . . . . . . . . . . . . . 38 9.2.3. Object Creation . . . . . . . . . . . . . . . . . . . 39
9.3.3. Object Creation . . . . . . . . . . . . . . . . . . . 38 9.2.4. Existing Objects . . . . . . . . . . . . . . . . . . 40
9.3.4. Existing Objects . . . . . . . . . . . . . . . . . . 38 9.2.5. Label Changes . . . . . . . . . . . . . . . . . . . . 40
9.3.5. Label Changes . . . . . . . . . . . . . . . . . . . . 39 9.3. pNFS Considerations . . . . . . . . . . . . . . . . . . . 40
9.4. pNFS Considerations . . . . . . . . . . . . . . . . . . . 39 9.4. Discovery of Server Labeled NFS Support . . . . . . . . . 41
9.5. Discovery of Server Labeled NFS Support . . . . . . . . . 39 9.5. MAC Security NFS Modes of Operation . . . . . . . . . . . 41
9.6. MAC Security NFS Modes of Operation . . . . . . . . . . . 40 9.5.1. Full Mode . . . . . . . . . . . . . . . . . . . . . . 41
9.6.1. Full Mode . . . . . . . . . . . . . . . . . . . . . . 40 9.5.2. Guest Mode . . . . . . . . . . . . . . . . . . . . . 43
9.6.2. Guest Mode . . . . . . . . . . . . . . . . . . . . . 41 9.6. Security Considerations for Labeled NFS . . . . . . . . . 43
9.7. Security Considerations for Labeled NFS . . . . . . . . . 42
10. Sharing change attribute implementation characteristics with 10. Sharing change attribute implementation characteristics with
NFSv4 clients . . . . . . . . . . . . . . . . . . . . . . . . 42 NFSv4 clients . . . . . . . . . . . . . . . . . . . . . . . . 43
11. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 43 11. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 44
11.1. Error Definitions . . . . . . . . . . . . . . . . . . . 43 11.1. Error Definitions . . . . . . . . . . . . . . . . . . . 44
11.1.1. General Errors . . . . . . . . . . . . . . . . . . . 43 11.1.1. General Errors . . . . . . . . . . . . . . . . . . . 45
11.1.2. Server to Server Copy Errors . . . . . . . . . . . . 43 11.1.2. Server to Server Copy Errors . . . . . . . . . . . . 45
11.1.3. Labeled NFS Errors . . . . . . . . . . . . . . . . . 44 11.1.3. Labeled NFS Errors . . . . . . . . . . . . . . . . . 46
11.2. New Operations and Their Valid Errors . . . . . . . . . 44 11.2. New Operations and Their Valid Errors . . . . . . . . . 46
11.3. New Callback Operations and Their Valid Errors . . . . . 48 11.3. New Callback Operations and Their Valid Errors . . . . . 50
12. New File Attributes . . . . . . . . . . . . . . . . . . . . . 49 12. New File Attributes . . . . . . . . . . . . . . . . . . . . . 51
12.1. New RECOMMENDED Attributes - List and Definition 12.1. New RECOMMENDED Attributes - List and Definition
References . . . . . . . . . . . . . . . . . . . . . . . 49 References . . . . . . . . . . . . . . . . . . . . . . . 51
12.2. Attribute Definitions . . . . . . . . . . . . . . . . . 50 12.2. Attribute Definitions . . . . . . . . . . . . . . . . . 52
13. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 52 13. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 54
14. Modifications to NFSv4.1 Operations . . . . . . . . . . . . . 56 14. Modifications to NFSv4.1 Operations . . . . . . . . . . . . . 58
14.1. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 56 14.1. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 58
14.2. Operation 48: GETDEVICELIST - Get All Device Mappings 14.2. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 57 for a File System . . . . . . . . . . . . . . . . . . . 59
15. NFSv4.2 Operations . . . . . . . . . . . . . . . . . . . . . 59 15. NFSv4.2 Operations . . . . . . . . . . . . . . . . . . . . . 61
15.1. Operation 59: ALLOCATE - Reserve Space in A Region of a 15.1. Operation 59: ALLOCATE - Reserve Space in A Region of a
File . . . . . . . . . . . . . . . . . . . . . . . . . . 59 File . . . . . . . . . . . . . . . . . . . . . . . . . . 61
15.2. Operation 60: COPY - Initiate a server-side copy . . . . 60
15.2. Operation 60: COPY - Initiate a server-side copy . . . . 62
15.3. Operation 61: COPY_NOTIFY - Notify a source server of a 15.3. Operation 61: COPY_NOTIFY - Notify a source server of a
future copy . . . . . . . . . . . . . . . . . . . . . . 64 future copy . . . . . . . . . . . . . . . . . . . . . . 66
15.4. Operation 62: DEALLOCATE - Unreserve Space in a Region 15.4. Operation 62: DEALLOCATE - Unreserve Space in a Region
of a File . . . . . . . . . . . . . . . . . . . . . . . 66 of a File . . . . . . . . . . . . . . . . . . . . . . . 68
15.5. Operation 63: IO_ADVISE - Application I/O access pattern 15.5. Operation 63: IO_ADVISE - Application I/O access pattern
hints . . . . . . . . . . . . . . . . . . . . . . . . . 68 hints . . . . . . . . . . . . . . . . . . . . . . . . . 70
15.6. Operation 64: LAYOUTERROR - Provide Errors for the 15.6. Operation 64: LAYOUTERROR - Provide Errors for the
Layout . . . . . . . . . . . . . . . . . . . . . . . . . 73
15.7. Operation 65: LAYOUTSTATS - Provide Statistics for the
Layout . . . . . . . . . . . . . . . . . . . . . . . . . 76 Layout . . . . . . . . . . . . . . . . . . . . . . . . . 76
15.7. Operation 65: LAYOUTSTATS - Provide Statistics for the
Layout . . . . . . . . . . . . . . . . . . . . . . . . . 79
15.8. Operation 66: OFFLOAD_CANCEL - Stop an Offloaded 15.8. Operation 66: OFFLOAD_CANCEL - Stop an Offloaded
Operation . . . . . . . . . . . . . . . . . . . . . . . 78 Operation . . . . . . . . . . . . . . . . . . . . . . . 80
15.9. Operation 67: OFFLOAD_STATUS - Poll for Status of 15.9. Operation 67: OFFLOAD_STATUS - Poll for Status of
Asynchronous Operation . . . . . . . . . . . . . . . . . 79 Asynchronous Operation . . . . . . . . . . . . . . . . . 81
15.10. Operation 68: READ_PLUS - READ Data or Holes from a File 80 15.10. Operation 68: READ_PLUS - READ Data or Holes from a File 82
15.11. Operation 69: SEEK - Find the Next Data or Hole . . . . 85 15.11. Operation 69: SEEK - Find the Next Data or Hole . . . . 87
15.12. Operation 70: WRITE_SAME - WRITE an ADB Multiple Times 15.12. Operation 70: WRITE_SAME - WRITE an ADB Multiple Times
to a File . . . . . . . . . . . . . . . . . . . . . . . 86 to a File . . . . . . . . . . . . . . . . . . . . . . . 88
15.13. Operation 71: CLONE - Clone a range of file into another 15.13. Operation 71: CLONE - Clone a range of file into another
file . . . . . . . . . . . . . . . . . . . . . . . . . . 90 file . . . . . . . . . . . . . . . . . . . . . . . . . . 92
16. NFSv4.2 Callback Operations . . . . . . . . . . . . . . . . . 92 16. NFSv4.2 Callback Operations . . . . . . . . . . . . . . . . . 94
16.1. Operation 15: CB_OFFLOAD - Report results of an 16.1. Operation 15: CB_OFFLOAD - Report results of an
asynchronous operation . . . . . . . . . . . . . . . . . 92 asynchronous operation . . . . . . . . . . . . . . . . . 94
17. Security Considerations . . . . . . . . . . . . . . . . . . . 93 17. Security Considerations . . . . . . . . . . . . . . . . . . . 95
18. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 94 18. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 96
19. References . . . . . . . . . . . . . . . . . . . . . . . . . 94 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 96
19.1. Normative References . . . . . . . . . . . . . . . . . . 94 19.1. Normative References . . . . . . . . . . . . . . . . . . 96
19.2. Informative References . . . . . . . . . . . . . . . . . 94 19.2. Informative References . . . . . . . . . . . . . . . . . 97
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 96 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 98
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 97 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 99
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 97 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 99
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 2 Protocol
The NFS version 4 minor version 2 (NFSv4.2) protocol is the third The NFS version 4 minor version 2 (NFSv4.2) protocol is the third
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0, is described in [RFC7530] and the second minor version, NFSv4.0, is described in [RFC7530] and the second minor
version, NFSv4.1, is described in [RFC5661]. version, NFSv4.1, is described in [RFC5661].
As a minor version, NFSv4.2 is consistent with the overall goals for As a minor version, NFSv4.2 is consistent with the overall goals for
NFSv4, but extends the protocol so as to better meet those goals, NFSv4, but extends the protocol so as to better meet those goals,
based on experiences with NFSv4.1. In addition, NFSv4.2 has adopted based on experiences with NFSv4.1. In addition, NFSv4.2 has adopted
some additional goals, which motivate some of the major extensions in some additional goals, which motivate some of the major extensions in
NFSv4.2. NFSv4.2.
1.2. Scope of This Document 1.1. Scope of This Document
This document describes the NFSv4.2 protocol. With respect to This document describes the NFSv4.2 protocol. With respect to
NFSv4.0 and NFSv4.1, this document does not: NFSv4.0 and NFSv4.1, this document does not:
o describe the NFSv4.0 or NFSv4.1 protocols, except where needed to o describe the NFSv4.0 or NFSv4.1 protocols, except where needed to
contrast with NFSv4.2 contrast with NFSv4.2
o modify the specification of the NFSv4.0 or NFSv4.1 protocols o modify the specification of the NFSv4.0 or NFSv4.1 protocols
o clarify the NFSv4.0 or NFSv4.1 protocols. I.e., any o clarify the NFSv4.0 or NFSv4.1 protocols, that is any
clarifications made here apply to NFSv4.2 and neither of the prior clarifications made here apply only to NFSv4.2 and neither of the
protocols prior protocols
NFSv4.2 is a superset of NFSv4.1, with all of the new features being
optional. As such, NFSv4.2 maintains the same compatibility that
NFSv4.1 had with NFSv4.0. Any interactions of a new feature with
NFSv4.1 semantics, is described in the relevant text.
The full External Data Representation (XDR) [RFC4506] for NFSv4.2 is The full External Data Representation (XDR) [RFC4506] for NFSv4.2 is
presented in [NFSv42xdr]. presented in [I-D.ietf-nfsv4-minorversion2-dot-x].
1.3. NFSv4.2 Goals 1.2. NFSv4.2 Goals
A major goal of the design of NFSv4.2 is to take common local file A major goal of the design of NFSv4.2 is to take common local file
system features and offer them remotely. These features might system features and offer them remotely. These features might
o already be available on the servers, e.g., sparse files o already be available on the servers, e.g., sparse files
o be under development as a new standard, e.g., SEEK pulls in both o be under development as a new standard, e.g., SEEK pulls in both
SEEK_HOLE and SEEK_DATA SEEK_HOLE and SEEK_DATA
o be used by clients with the servers via some proprietary means, o be used by clients with the servers via some proprietary means,
e.g., Labeled NFS e.g., Labeled NFS
NFSv4.2 provides means for clients to leverage these features on the NFSv4.2 provides means for clients to leverage these features on the
server in cases in which that had previously not been possible within server in cases in which that had previously not been possible within
the confines of the NFS protocol. the confines of the NFS protocol.
1.4. Overview of NFSv4.2 Features 1.3. Overview of NFSv4.2 Features
1.4.1. Server Side Copy 1.3.1. Server Side Clone and Copy
A traditional file copy of a remotely accessed, whether from one A traditional file copy of a remotely accessed file, whether from one
server to another or between location in the same server, results in server to another or between locations in the same server, results in
the data being put on the network twice - source to client and then the data being put on the network twice - source to client and then
client to destination. New operations are introduced to allow client to destination. New operations are introduced to allow
unnecessary traffic to be eliminated: unnecessary traffic to be eliminated:
The intra-server copy feature allows the client to request the o The intra-server clone feature allows the client to request a
synchronous cloning, perhaps by copy-on-write semantics.
o The intra-server copy feature allows the client to request the
server to perform the copy internally, avoiding unnecessary server to perform the copy internally, avoiding unnecessary
network traffic. network traffic.
The inter-server copy feature allows the client to authorize the o The inter-server copy feature allows the client to authorize the
source and destination servers to interact directly. source and destination servers to interact directly.
As such copies can be lengthy, asynchronous support is also provided. As such copies can be lengthy, asynchronous support is also provided.
1.4.2. Application I/O Advise 1.3.2. Application Input/Output (I/O) Advise
Applications and clients want to advise the server as to expected I/O Applications and clients want to advise the server as to expected I/O
behavior. Using IO_ADVISE (see Section 15.5) to communicate future I behavior. Using IO_ADVISE (see Section 15.5) to communicate future I
/O behavior such as whether a file will be accessed sequentially or /O behavior such as whether a file will be accessed sequentially or
randomly, and whether a file will or will not be accessed in the near randomly, and whether a file will or will not be accessed in the near
future, allows servers to optimize future I/O requests for a file by, future, allows servers to optimize future I/O requests for a file by,
for example, prefetching or evicting data. This operation can be for example, prefetching or evicting data. This operation can be
used to support the posix_fadvise function. In addition, it may be used to support the posix_fadvise [posix_fadvise] function. In
helpful to applications such as databases and video editors. addition, it may be helpful to applications such as databases and
video editors.
1.4.3. Sparse Files 1.3.3. Sparse Files
Sparse files are ones which have unallocated or uninitialized data Sparse files are ones which have unallocated or uninitialized data
blocks as holes in the file. Such holes are typically transferred as blocks as holes in the file. Such holes are typically transferred as
0s during I/O. READ_PLUS (see Section 15.10) allows a server to send 0s when read from the file. READ_PLUS (see Section 15.10) allows a
back to the client metadata describing the hole and DEALLOCATE (see server to send back to the client metadata describing the hole and
Section 15.4) allows the client to punch holes into a file. In DEALLOCATE (see Section 15.4) allows the client to punch holes into a
addition, SEEK (see Section 15.11) is provided to scan for the next file. In addition, SEEK (see Section 15.11) is provided to scan for
hole or data from a given location. the next hole or data from a given location.
1.4.4. Space Reservation 1.3.4. Space Reservation
When a file is sparse, one concern applications have is ensuring that When a file is sparse, one concern applications have is ensuring that
there will always be enough data blocks available for the file during there will always be enough data blocks available for the file during
future writes. ALLOCATE (see Section 15.1) allows a client to future writes. ALLOCATE (see Section 15.1) allows a client to
request a guarantee that space will be available. Also DEALLOCATE request a guarantee that space will be available. Also DEALLOCATE
(see Section 15.4) allows the client to punch a hole into a file, (see Section 15.4) allows the client to punch a hole into a file,
thus releasing a space reservation. thus releasing a space reservation.
1.4.5. Application Data Block (ADB) Support 1.3.5. Application Data Block (ADB) Support
Some applications treat a file as if it were a disk and as such want Some applications treat a file as if it were a disk and as such want
to initialize (or format) the file image. We introduce WRITE_SAME to initialize (or format) the file image. We introduce WRITE_SAME
(see Section 15.12) to send this metadata to the server to allow it (see Section 15.12) to send this metadata to the server to allow it
to write the block contents. to write the block contents.
1.4.6. Labeled NFS 1.3.6. Labeled NFS
While both clients and servers can employ Mandatory Access Control While both clients and servers can employ Mandatory Access Control
(MAC) security models to enforce data access, there has been no (MAC) security models to enforce data access, there has been no
protocol support for interoperability. A new file object attribute, protocol support for interoperability. A new file object attribute,
sec_label (see Section 12.2.4) allows for the server to store MAC sec_label (see Section 12.2.4) allows for the server to store MAC
labels on files, which the client retrieves and uses to enforce data labels on files, which the client retrieves and uses to enforce data
access (see Section 9.6.2). The format of the sec_label accommodates access (see Section 9.5.2). The format of the sec_label accommodates
any MAC security system. any MAC security system.
1.5. Enhancements to Minor Versioning Model 1.3.7. Layout Enhancements
In the parallel NFS implementations of NFSv4.1 (see Section 12 of
[RFC5661]), the client cannot communicate back to the metadata server
any errors or performance characteristics with the storage devices.
NFSv4.2 provides two new operations to do so respectively:
LAYOUTERROR (see Section 15.6) and LAYOUTSTATS (see Section 15.7).
1.4. Enhancements to Minor Versioning Model
In NFSv4.1, the only way to introduce new variants of an operation In NFSv4.1, the only way to introduce new variants of an operation
was to introduce a new operation. I.e., READ becomes either READ2 or was to introduce a new operation. For instance, READ would have to
READ_PLUS. With the use of discriminated unions as parameters to be replaced or supplemented by, say, either READ2 or READ_PLUS. With
such functions in NFSv4.2, it is possible to add a new arm in a the use of discriminated unions as parameters to such functions in
subsequent minor version. And it is also possible to move such an NFSv4.2, it is possible to add a new arm in a subsequent minor
operation from OPTIONAL/RECOMMENDED to REQUIRED. Forcing an version. And it is also possible to move such an operation from
implementation to adopt each arm of a discriminated union at such a OPTIONAL/RECOMMENDED to REQUIRED. Forcing an implementation to adopt
time does not meet the spirit of the minor versioning rules. As each arm of a discriminated union at such a time does not meet the
such, new arms of a discriminated union MUST follow the same spirit of the minor versioning rules. As such, new arms of a
guidelines for minor versioning as operations in NFSv4.1 - i.e., they discriminated union MUST follow the same guidelines for minor
may not be made REQUIRED. To support this, a new error code, versioning as operations in NFSv4.1 - i.e., they may not be made
NFS4ERR_UNION_NOTSUPP, allows the server to communicate to the client REQUIRED. To support this, a new error code, NFS4ERR_UNION_NOTSUPP,
that the operation is supported, but the specific arm of the allows the server to communicate to the client that the operation is
discriminated union is not. supported, but the specific arm of the discriminated union is not.
2. Minor Versioning 2. Minor Versioning
NFSv4.2 is a minor version of NFSv4 and is built upon NFSv4.1 as NFSv4.2 is a minor version of NFSv4 and is built upon NFSv4.1 as
documented in [RFC5661] and [RFC5662]. documented in [RFC5661] and [RFC5662].
NFSv4.2 does not modify the rules applicable to the NFSv4 versioning NFSv4.2 does not modify the rules applicable to the NFSv4 versioning
process and follows the rules set out in [RFC5661] or in standard- process and follows the rules set out in [RFC5661] or in standard-
track documents updating that document (e.g., in an RFC based on track documents updating that document (e.g., in an RFC based on
[NFSv4-Versioning]). [NFSv4-Versioning]).
skipping to change at page 8, line 6 skipping to change at page 8, line 22
o change the status of existing features (i.e., by changing their o change the status of existing features (i.e., by changing their
status among OPTIONAL, RECOMMENDED, REQUIRED). status among OPTIONAL, RECOMMENDED, REQUIRED).
The following versioning-related considerations should be noted. The following versioning-related considerations should be noted.
o When a new case is added to an existing switch, servers need to o When a new case is added to an existing switch, servers need to
report non-support of that new case by returning report non-support of that new case by returning
NFS4ERR_UNION_NOTSUPP. NFS4ERR_UNION_NOTSUPP.
o As regards the potential cross-minor-version transfer of stateids, o As regards the potential cross-minor-version transfer of stateids,
pNFS implementations of the file mapping type may support of use Parallel NFS (pNFS) (see Section 12 of [RFC5661]) implementations
of an NFSv4.2 metadata sever with NFSv4.1 data servers. In this of the file mapping type may support of use of an NFSv4.2 metadata
context, a stateid returned by an NFSv4.2 COMPOUND will be used in sever (see Sections 1.7.2.2 and 12.2.2 of [RFC5661]) with NFSv4.1
an NFSv4.1 COMPOUND directed to the data server (see Sections 3.2 data servers. In this context, a stateid returned by an NFSv4.2
and 3.3). COMPOUND will be used in an NFSv4.1 COMPOUND directed to the data
server (see Sections 3.2 and 3.3).
3. pNFS considerations for New Operations 3. pNFS considerations for New Operations
The interactions of the new operations with non-pNFS functionality is
straight forward and covered in the relevant sections. However, the
interactions of the new operations with pNFS is more complicated and
this section provides an overview.
3.1. Atomicity for ALLOCATE and DEALLOCATE 3.1. Atomicity for ALLOCATE and DEALLOCATE
Both ALLOCATE (see Section 15.1) and DEALLOCATE (see Section 15.4) Both ALLOCATE (see Section 15.1) and DEALLOCATE (see Section 15.4)
are sent to the metadata server, which is responsible for are sent to the metadata server, which is responsible for
coordinating the changes onto the storage devices. In particular, coordinating the changes onto the storage devices. In particular,
both operations must either fully succeed or fail, it cannot be the both operations must either fully succeed or fail, it cannot be the
case that one storage device succeeds whilst another fails. case that one storage device succeeds whilst another fails.
3.2. Sharing of stateids with NFSv4.1 3.2. Sharing of stateids with NFSv4.1
skipping to change at page 8, line 35 skipping to change at page 9, line 12
device. Section 13.9.1 of [RFC5661] discusses how the client gets a device. Section 13.9.1 of [RFC5661] discusses how the client gets a
stateid from the metadata server to present to a storage device. stateid from the metadata server to present to a storage device.
3.3. NFSv4.2 as a Storage Protocol in pNFS: the File Layout Type 3.3. NFSv4.2 as a Storage Protocol in pNFS: the File Layout Type
A file layout provided by a NFSv4.2 server may refer either to a A file layout provided by a NFSv4.2 server may refer either to a
storage device that only implements NFSv4.1 as specified in storage device that only implements NFSv4.1 as specified in
[RFC5661], or to a storage device that implements additions from [RFC5661], or to a storage device that implements additions from
NFSv4.2, in which case the rules in Section 3.3.1 apply. As the File NFSv4.2, in which case the rules in Section 3.3.1 apply. As the File
Layout Type does not provide a means for informing the client as to Layout Type does not provide a means for informing the client as to
which minor version a particular storage device is providing, it will which minor version a particular storage device is providing, the
have to negotiate this via the normal RPC semantics of major and client will have to negotiate this with the storage device via the
minor version discovery. normal Remote Procedure Call (RPC) semantics of major and minor
version discovery. E.g., as per Section 16.2.3 of [RFC5661], the
client could try a COMPOUND with a minorversion of 2 and if it gets
NFS4ERR_MINOR_VERS_MISMATCH, drop back to 1.
3.3.1. Operations Sent to NFSv4.2 Data Servers 3.3.1. Operations Sent to NFSv4.2 Data Servers
In addition to the commands listed in [RFC5661], NFSv4.2 data servers In addition to the commands listed in [RFC5661], NFSv4.2 data servers
MAY accept a COMPOUND containing the following additional operations: MAY accept a COMPOUND containing the following additional operations:
IO_ADVISE (see Section 15.5), READ_PLUS (see Section 15.10), IO_ADVISE (see Section 15.5), READ_PLUS (see Section 15.10),
WRITE_SAME (see Section 15.12), and SEEK (see Section 15.11), which WRITE_SAME (see Section 15.12), and SEEK (see Section 15.11), which
will be treated like the subset specified as "Operations Sent to will be treated like the subset specified as "Operations Sent to
NFSv4.1 Data Servers" in Section 13.6 of [RFC5661]. NFSv4.1 Data Servers" in Section 13.6 of [RFC5661].
Additional details on the implementation of these operations in a Additional details on the implementation of these operations in a
pNFS context are documented in the operation specific sections. pNFS context are documented in the operation specific sections.
4. Server Side Copy 4. Server Side Copy
4.1. Introduction 4.1. Introduction
The server-side copy feature provides a mechanism for the NFS client The server-side copy features provide mechanisms which allow an NFS
to perform a file copy on a server or between two servers without the client to copy file data on a server or between two servers without
data being transmitted back and forth over the network through the the data being transmitted back and forth over the network through
NFS client. Without this feature, an NFS client copies data from one the NFS client. Without these features, an NFS client would copy
location to another by reading the data from the source server over data from one location to another by reading the data from the source
the network, and then writing the data back over the network to the server over the network, and then writing the data back over the
destination server. network to the destination server.
If the source object and destination object are on different file If the source object and destination object are on different file
servers, the file servers will communicate with one another to servers, the file servers will communicate with one another to
perform the copy operation. The server-to-server protocol by which perform the copy operation. The server-to-server protocol by which
this is accomplished is not defined in this document. this is accomplished is not defined in this document.
4.2. Protocol Overview 4.2. Protocol Overview
The server-side copy offload operations support both intra-server and The server-side copy offload operations support both intra-server and
inter-server file copies. An intra-server copy is a copy in which inter-server file copies. An intra-server copy is a copy in which
the source file and destination file reside on the same server. In the source file and destination file reside on the same server. In
an inter-server copy, the source file and destination file are on an inter-server copy, the source file and destination file are on
different servers. In both cases, the copy may be performed different servers. In both cases, the copy may be performed
synchronously or asynchronously. synchronously or asynchronously.
In addition, the CLONE operation provides copy-like functionality in
the intra-sever case which is both synchronous and atomic, in that
other operations may not see the target file in any state between
that before the clone operation and after it.
Throughout the rest of this document, we refer to the NFS server Throughout the rest of this document, we refer to the NFS server
containing the source file as the "source server" and the NFS server containing the source file as the "source server" and the NFS server
to which the file is transferred as the "destination server". In the to which the file is transferred as the "destination server". In the
case of an intra-server copy, the source server and destination case of an intra-server copy, the source server and destination
server are the same server. Therefore in the context of an intra- server are the same server. Therefore in the context of an intra-
server copy, the terms source server and destination server refer to server copy, the terms source server and destination server refer to
the single server performing the copy. the single server performing the copy.
The new operations are designed to copy files. Other file system The new operations are designed to copy files or regions within them.
objects can be copied by building on these operations or using other Other file system objects can be copied by building on these
techniques. For example, if the user wishes to copy a directory, the operations or using other techniques. For example, if a user wishes
client can synthesize a directory copy by first creating the to copy a directory, the client can synthesize a directory copy
destination directory and then copying the source directory's files operation by first creating the destination directory and the
to the new destination directory. individual (empty) files within it, and then copying the contents of
the source directory's files to files in the new destination
directory.
For the inter-server copy, the operations are defined to be For the inter-server copy, the operations are defined to be
compatible with the traditional copy authentication approach. The compatible with the traditional copy authorization approach. The
client and user are authorized at the source for reading. Then they client and user are authorized at the source for reading. Then they
are authorized at the destination for writing. are authorized at the destination for writing.
4.2.1. Copy Operations 4.2.1. Copy Operations
COPY_NOTIFY: Used by the client to notify the source server of a CLONE: Used by the client to request an synchronous atomic copy-like
future file copy from a given destination server for the given operation. (Section 15.13)
user. (Section 15.3)
COPY_NOTIFY: Used by the client to request the source server to
authorize a future file copy that will be made by a given
destination server on behalf of the given user. (Section 15.3)
COPY: Used by the client to request a file copy. (Section 15.2) COPY: Used by the client to request a file copy. (Section 15.2)
OFFLOAD_CANCEL: Used by the client to terminate an asynchronous file OFFLOAD_CANCEL: Used by the client to terminate an asynchronous file
copy. (Section 15.8) copy. (Section 15.8)
OFFLOAD_STATUS: Used by the client to poll the status of an OFFLOAD_STATUS: Used by the client to poll the status of an
asynchronous file copy. (Section 15.9) asynchronous file copy. (Section 15.9)
CB_OFFLOAD: Used by the destination server to report the results of CB_OFFLOAD: Used by the destination server to report the results of
an asynchronous file copy to the client. (Section 16.1) an asynchronous file copy to the client. (Section 16.1)
4.2.2. Requirements for Operations 4.2.2. Requirements for Operations
The implementation of server-side copy is OPTIONAL by the client and Three OPTIONAL features are provided relative to server-side copy. A
the server. However, in order to successfully copy a file, some server may choose independently to implement any of them. A server
operations MUST be supported by the client and/or server. implementing any of these features may be REQUIRED to implement
certain operations. Other operations are OPTIONAL in the context of
a particular feature Section 13, but may become REQUIRED depending on
server behavior. Clients need to use these operations to
successfully copy a file.
If a client desires an intra-server file copy, then it MUST support For a client to do an intra-server file copy, it needs to use either
the COPY and CB_OFFLOAD operations. If COPY returns a stateid, then the COPY or the CLONE operation. If COPY is used the client MUST
the client MAY use the OFFLOAD_CANCEL and OFFLOAD_STATUS operations. support the CB_OFFLOAD operation. If COPY is used and it returns a
stateid, then the client MAY use the OFFLOAD_CANCEL and
OFFLOAD_STATUS operations.
If a client desires an inter-server file copy, then it MUST support For a client to do an inter-server file copy, then it needs to use
the COPY, COPY_NOTIFY, and CB_OFFLOAD operations, and MAY use the the COPY and COPY_NOTIFY operations and MUST support the CB_OFFLOAD
OFFLOAD_CANCEL operation. If COPY returns a stateid, then the client operation. If COPY returns a stateid, then the client MAY use the
MAY use the OFFLOAD_CANCEL and OFFLOAD_STATUS operations. OFFLOAD_CANCEL and OFFLOAD_STATUS operations.
If a server supports intra-server copy, then the server MUST support If a server supports intra-server copy feature, then the server MUST
the COPY operation. If a server's COPY operation returns a stateid, support the COPY operation. If a server's COPY operation returns a
then the server MUST also support these operations: CB_OFFLOAD, stateid, then the server MUST also support these operations:
OFFLOAD_CANCEL, and OFFLOAD_STATUS. CB_OFFLOAD, OFFLOAD_CANCEL, and OFFLOAD_STATUS.
If a source server supports inter-server copy, then the source server If a server supports the clone feature, then it MUST support the
MUST support all these operations: COPY_NOTIFY and OFFLOAD_CANCEL. CLONE operations and the clone_blksize attribute on any filesystem on
If a destination server supports inter-server copy, then the which CLONE is supported (as either source or destination file).
destination server MUST support the COPY operation. If a destination
server's COPY operation returns a stateid, then the destination If a source server supports inter-server copy feature, then it MUST
server MUST also support these operations: CB_OFFLOAD, support the operations COPY_NOTIFY and OFFLOAD_CANCEL. If a
OFFLOAD_CANCEL, COPY_NOTIFY, and OFFLOAD_STATUS. destination server supports inter-server copy feature, then it MUST
support the COPY operation. If a destination server's COPY operation
returns a stateid, then the destination server MUST also support
these operations: CB_OFFLOAD, OFFLOAD_CANCEL, COPY_NOTIFY, and
OFFLOAD_STATUS.
Each operation is performed in the context of the user identified by Each operation is performed in the context of the user identified by
the ONC RPC credential of its containing COMPOUND or CB_COMPOUND the Open Network Computing (ONC) RPC credential of its containing
request. For example, an OFFLOAD_CANCEL operation issued by a given COMPOUND or CB_COMPOUND request. For example, an OFFLOAD_CANCEL
user indicates that a specified COPY operation initiated by the same operation issued by a given user indicates that a specified COPY
user be canceled. Therefore an OFFLOAD_CANCEL MUST NOT interfere operation initiated by the same user be canceled. Therefore an
with a copy of the same file initiated by another user. OFFLOAD_CANCEL MUST NOT interfere with a copy of the same file
initiated by another user.
An NFS server MAY allow an administrative user to monitor or cancel An NFS server MAY allow an administrative user to monitor or cancel
copy operations using an implementation specific interface. copy operations using an implementation specific interface.
4.3. Requirements for Inter-Server Copy 4.3. Requirements for Inter-Server Copy
Inter-server copy is driven by several requirements: The specification of inter-server copy is driven by several
requirements:
o The specification MUST NOT mandate the server-to-server protocol. o The specification MUST NOT mandate the server-to-server protocol.
o The specification MUST provide guidance for using NFSv4.x as a o The specification MUST provide guidance for using NFSv4.x as a
copy protocol. For those source and destination servers willing copy protocol. For those source and destination servers willing
to use NFSv4.x, there are specific security considerations that to use NFSv4.x, there are specific security considerations that
this specification MUST address. this specification MUST address.
o The specification MUST NOT mandate preconfiguration between the o The specification MUST NOT mandate preconfiguration between the
source and destination server. Requiring that the source and source and destination server. Requiring that the source and
skipping to change at page 11, line 47 skipping to change at page 12, line 45
Both the source and destination file may need to be locked to protect Both the source and destination file may need to be locked to protect
the content during the copy operations. A client can achieve this by the content during the copy operations. A client can achieve this by
a combination of OPEN and LOCK operations. I.e., either share or a combination of OPEN and LOCK operations. I.e., either share or
byte range locks might be desired. byte range locks might be desired.
Note that when the client establishes a lock stateid on the source, Note that when the client establishes a lock stateid on the source,
the context of that stateid is for the client and not the the context of that stateid is for the client and not the
destination. As such, there might already be an outstanding stateid, destination. As such, there might already be an outstanding stateid,
issued to the destination as client of the source, with the same issued to the destination as client of the source, with the same
value as that provided for the lock stateid. The source MUST equate value as that provided for the lock stateid. The source MUST
the lock stateid as that of the client, i.e., when the destination interpret the lock stateid as that of the client, i.e., when the
presents it in the context of a inter-server copy, it is on behalf of destination presents it in the context of a inter-server copy, it is
the client. on behalf of the client.
4.4.2. Client Caches 4.4.2. Client Caches
In a traditional copy, if the client is in the process of writing to In a traditional copy, if the client is in the process of writing to
the file before the copy (and perhaps with a write delegation), it the file before the copy (and perhaps with a write delegation), it
will be straightforward to update the destination server. With an will be straightforward to update the destination server. With an
inter-server copy, the source has no insight into the changes cached inter-server copy, the source has no insight into the changes cached
on the client. The client SHOULD write back the data to the source. on the client. The client SHOULD write back the data to the source.
If it does not do so, it is possible that the destination will If it does not do so, it is possible that the destination will
receive a corrupt copy of file. receive a corrupt copy of file.
skipping to change at page 15, line 8 skipping to change at page 16, line 8
the destination server chooses to perform the copy before responding the destination server chooses to perform the copy before responding
to the client's COPY request. to the client's COPY request.
An asynchronous copy is shown in Figure 5. In this case, the An asynchronous copy is shown in Figure 5. In this case, the
destination server chooses to respond to the client's COPY request destination server chooses to respond to the client's COPY request
immediately and then perform the copy asynchronously. immediately and then perform the copy asynchronously.
Client Source Destination Client Source Destination
+ + + + + +
| | | | | |
|--- OPEN --->| | Returns os1 |--- OPEN --->| | Returns
|<------------------/| | |<------------------/| | open state os1
| | | | | |
|--- COPY_NOTIFY --->| | |--- COPY_NOTIFY --->| |
|<------------------/| | |<------------------/| |
| | | | | |
|--- OPEN ---------------------------->| Returns os2 |--- OPEN ---------------------------->| Returns
|<------------------------------------/| |<------------------------------------/| open state os2
| | | | | |
|--- COPY ---------------------------->| |--- COPY ---------------------------->|
| | | | | |
| | | | | |
| |<----- read -----| | |<----- read -----|
| |\--------------->| | |\--------------->|
| | | | | |
| | . | Multiple reads may | | . | Multiple reads may
| | . | be necessary | | . | be necessary
| | . | | | . |
| | | | | |
| | | | | |
|<------------------------------------/| Destination replies |<------------------------------------/| Destination replies
| | | to COPY | | | to COPY
| | | | | |
|--- CLOSE --------------------------->| Release open state |--- CLOSE --------------------------->| Release os2
|<------------------------------------/| |<------------------------------------/|
| | | | | |
|--- CLOSE --->| | Release open state |--- CLOSE --->| | Release os1
|<------------------/| | |<------------------/| |
Figure 4: A synchronous inter-server copy. Figure 4: A synchronous inter-server copy.
Client Source Destination Client Source Destination
+ + + + + +
| | | | | |
|--- OPEN --->| | Returns os1 |--- OPEN --->| | Returns
|<------------------/| | |<------------------/| | open state os1
| | | | | |
|--- LOCK --->| | Optional, could be done |--- LOCK --->| | Optional, could be done
|<------------------/| | with a share lock |<------------------/| | with a share lock
| | | | | |
|--- COPY_NOTIFY --->| | Need to pass in |--- COPY_NOTIFY --->| | Need to pass in
|<------------------/| | os1 or lock state |<------------------/| | os1 or lock state
| | | | | |
| | | | | |
| | | | | |
|--- OPEN ---------------------------->| Returns os2 |--- OPEN ---------------------------->| Returns
|<------------------------------------/| |<------------------------------------/| open state os2
| | | | | |
|--- LOCK ---------------------------->| Optional ... |--- LOCK ---------------------------->| Optional ...
|<------------------------------------/| |<------------------------------------/|
| | | | | |
|--- COPY ---------------------------->| Need to pass in |--- COPY ---------------------------->| Need to pass in
|<------------------------------------/| os2 or lock state |<------------------------------------/| os2 or lock state
| | | | | |
| | | | | |
| |<----- read -----| | |<----- read -----|
| |\--------------->| | |\--------------->|
skipping to change at page 16, line 37 skipping to change at page 17, line 37
| | . | | | . |
| | | | | |
| | | | | |
| | | | | |
|<-- CB_OFFLOAD -----------------------| Destination reports |<-- CB_OFFLOAD -----------------------| Destination reports
|\------------------------------------>| results |\------------------------------------>| results
| | | | | |
|--- LOCKU --------------------------->| Only if LOCK was done |--- LOCKU --------------------------->| Only if LOCK was done
|<------------------------------------/| |<------------------------------------/|
| | | | | |
|--- CLOSE --------------------------->| Release open state |--- CLOSE --------------------------->| Release os2
|<------------------------------------/| |<------------------------------------/|
| | | | | |
|--- LOCKU --->| | Only if LOCK was done |--- LOCKU --->| | Only if LOCK was done
|<------------------/| | |<------------------/| |
| | | | | |
|--- CLOSE --->| | Release open state |--- CLOSE --->| | Release os1
|<------------------/| | |<------------------/| |
| | | | | |
Figure 5: An asynchronous inter-server copy. Figure 5: An asynchronous inter-server copy.
4.7. Server-to-Server Copy Protocol 4.7. Server-to-Server Copy Protocol
The choice of what protocol to use in an inter-server copy is The choice of what protocol to use in an inter-server copy is
ultimately the destination server's decision. However, the ultimately the destination server's decision. However, the
destination server has to be cognizant that it is working on behalf destination server has to be cognizant that it is working on behalf
skipping to change at page 17, line 31 skipping to change at page 18, line 31
requirements. requirements.
4.7.2. Using NFSv4.x as the Copy Protocol 4.7.2. Using NFSv4.x as the Copy Protocol
The destination server MAY use standard NFSv4.x (where x >= 1) The destination server MAY use standard NFSv4.x (where x >= 1)
operations to read the data from the source server. If NFSv4.x is operations to read the data from the source server. If NFSv4.x is
used for the server-to-server copy protocol, the destination server used for the server-to-server copy protocol, the destination server
can use the source filehandle and ca_src_stateid provided in the COPY can use the source filehandle and ca_src_stateid provided in the COPY
request with standard NFSv4.x operations to read data from the source request with standard NFSv4.x operations to read data from the source
server. Note that the ca_src_stateid MUST be the cnr_stateid server. Note that the ca_src_stateid MUST be the cnr_stateid
returned from the source via the COPY_NOTIFY. returned from the source via the COPY_NOTIFY (Section 15.3).
4.7.3. Using an Alternative Copy Protocol 4.7.3. Using an Alternative Copy Protocol
In a homogeneous environment, the source and destination servers In a homogeneous environment, the source and destination servers
might be able to perform the file copy extremely efficiently using might be able to perform the file copy extremely efficiently using
specialized protocols. For example the source and destination specialized protocols. For example the source and destination
servers might be two nodes sharing a common file system format for servers might be two nodes sharing a common file system format for
the source and destination file systems. Thus the source and the source and destination file systems. Thus the source and
destination are in an ideal position to efficiently render the image destination are in an ideal position to efficiently render the image
of the source file to the destination file by replicating the file of the source file to the destination file by replicating the file
system formats at the block level. Another possibility is that the system formats at the block level. Another possibility is that the
source and destination might be two nodes sharing a common storage source and destination might be two nodes sharing a common storage
area network, and thus there is no need to copy any data at all, and area network, and thus there is no need to copy any data at all, and
instead ownership of the file and its contents might simply be re- instead ownership of the file and its contents might simply be re-
assigned to the destination. To allow for these possibilities, the assigned to the destination. To allow for these possibilities, the
destination server is allowed to use a server-to-server copy protocol destination server is allowed to use a server-to-server copy protocol
of its choice. of its choice.
In a heterogeneous environment, using a protocol other than NFSv4.x In a heterogeneous environment, using a protocol other than NFSv4.x
(e.g., HTTP [RFC2616] or FTP [RFC959]) presents some challenges. In (e.g., HTTP [RFC7230] or FTP [RFC959]) presents some challenges. In
particular, the destination server is presented with the challenge of particular, the destination server is presented with the challenge of
accessing the source file given only an NFSv4.x filehandle. accessing the source file given only an NFSv4.x filehandle.
One option for protocols that identify source files with path names One option for protocols that identify source files with path names
is to use an ASCII hexadecimal representation of the source is to use an ASCII hexadecimal representation of the source
filehandle as the file name. filehandle as the file name.
Another option for the source server is to use URLs to direct the Another option for the source server is to use URLs to direct the
destination server to a specialized service. For example, the destination server to a specialized service. For example, the
response to COPY_NOTIFY could include the URL ftp:// response to COPY_NOTIFY could include the URL ftp://
skipping to change at page 18, line 50 skipping to change at page 19, line 50
union netloc4 switch (netloc_type4 nl_type) { union netloc4 switch (netloc_type4 nl_type) {
case NL4_NAME: utf8str_cis nl_name; case NL4_NAME: utf8str_cis nl_name;
case NL4_URL: utf8str_cis nl_url; case NL4_URL: utf8str_cis nl_url;
case NL4_NETADDR: netaddr4 nl_addr; case NL4_NETADDR: netaddr4 nl_addr;
}; };
<CODE ENDS> <CODE ENDS>
If the netloc4 is of type NL4_NAME, the nl_name field MUST be If the netloc4 is of type NL4_NAME, the nl_name field MUST be
specified as a UTF-8 string. The nl_name is expected to be resolved specified as a UTF-8 string. The nl_name is expected to be resolved
to a network address via DNS, LDAP, NIS, /etc/hosts, or some other to a network address via DNS, Lightweight Directory Access Protocol
(LDAP), Network Information Service (NIS), /etc/hosts, or some other
means. If the netloc4 is of type NL4_URL, a server URL [RFC3986] means. If the netloc4 is of type NL4_URL, a server URL [RFC3986]
appropriate for the server-to-server copy operation is specified as a appropriate for the server-to-server copy operation is specified as a
UTF-8 string. If the netloc4 is of type NL4_NETADDR, the nl_addr UTF-8 string. If the netloc4 is of type NL4_NETADDR, the nl_addr
field MUST contain a valid netaddr4 as defined in Section 3.3.9 of field MUST contain a valid netaddr4 as defined in Section 3.3.9 of
[RFC5661]. [RFC5661].
When netloc4 values are used for an inter-server copy as shown in When netloc4 values are used for an inter-server copy as shown in
Figure 3, their values may be evaluated on the source server, Figure 3, their values may be evaluated on the source server,
destination server, and client. The network environment in which destination server, and client. The network environment in which
these systems operate should be configured so that the netloc4 values these systems operate should be configured so that the netloc4 values
skipping to change at page 19, line 27 skipping to change at page 20, line 28
A server may perform a copy offload operation asynchronously. An A server may perform a copy offload operation asynchronously. An
asynchronous copy is tracked using a copy offload stateid. Copy asynchronous copy is tracked using a copy offload stateid. Copy
offload stateids are included in the COPY, OFFLOAD_CANCEL, offload stateids are included in the COPY, OFFLOAD_CANCEL,
OFFLOAD_STATUS, and CB_OFFLOAD operations. OFFLOAD_STATUS, and CB_OFFLOAD operations.
A copy offload stateid will be valid until either (A) the client or A copy offload stateid will be valid until either (A) the client or
server restarts or (B) the client returns the resource by issuing a server restarts or (B) the client returns the resource by issuing a
OFFLOAD_CANCEL operation or the client replies to a CB_OFFLOAD OFFLOAD_CANCEL operation or the client replies to a CB_OFFLOAD
operation. operation.
A copy offload stateid's seqid MUST NOT be 0. In the context of a A copy offload stateid's seqid MUST NOT be zero. In the context of a
copy offload operation, it is ambiguous to indicate the most recent copy offload operation, it is ambiguous to indicate the most recent
copy offload operation using a stateid with seqid of 0. Therefore a copy offload operation using a stateid with seqid of zero. Therefore
copy offload stateid with seqid of 0 MUST be considered invalid. a copy offload stateid with seqid of zero MUST be considered invalid.
4.10. Security Considerations 4.10. Security Considerations
The security considerations pertaining to NFSv4.1 [RFC5661] apply to The security considerations pertaining to NFSv4.1 [RFC5661] apply to
this section. And as such, the standard security mechanisms used by this section. And as such, the standard security mechanisms used by
the protocol can be used to secure the server-to-server operations. the protocol can be used to secure the server-to-server operations.
NFSv4 clients and servers supporting the inter-server copy operations NFSv4 clients and servers supporting the inter-server copy operations
described in this chapter are REQUIRED to implement the mechanism described in this chapter are REQUIRED to implement the mechanism
described in Section 4.10.1.1, and to support rejecting COPY_NOTIFY described in Section 4.10.1.1, and to support rejecting COPY_NOTIFY
requests that do not use RPCSEC_GSS with privacy. If the server-to- requests that do not use RPCSEC_GSS with privacy. If the server-to-
server copy protocol is ONC RPC based, the servers are also REQUIRED server copy protocol is ONC RPC based, the servers are also REQUIRED
to implement [rpcsec_gssv3] including the RPCSEC_GSSv3 copy_to_auth, to implement [rpcsec_gssv3] including the RPCSEC_GSSv3 copy_to_auth,
copy_from_auth, and copy_confirm_auth structured privileges. This copy_from_auth, and copy_confirm_auth structured privileges. This
requirement to implement is not a requirement to use; for example, a requirement to implement is not a requirement to use; for example, a
server may depending on configuration also allow COPY_NOTIFY requests server may depending on configuration also allow COPY_NOTIFY requests
that use only AUTH_SYS. that use only AUTH_SYS.
If a server requires the use of RPCSEC_GSSv3 copy_to_auth,
copy_from_auth, or copy_confirm_auth and it is not used, the server
will reject the request with NFS4ERR_PARTNER_NO_AUTH.
4.10.1. Inter-Server Copy Security 4.10.1. Inter-Server Copy Security
4.10.1.1. Inter-Server Copy via ONC RPC with RPCSEC_GSSv3 4.10.1.1. Inter-Server Copy via ONC RPC with RPCSEC_GSSv3
When the client sends a COPY_NOTIFY to the source server to expect When the client sends a COPY_NOTIFY to the source server to expect
the destination to attempt to copy data from the source server, it is the destination to attempt to copy data from the source server, it is
expected that this copy is being done on behalf of the principal expected that this copy is being done on behalf of the principal
(called the "user principal") that sent the RPC request that encloses (called the "user principal") that sent the RPC request that encloses
the COMPOUND procedure that contains the COPY_NOTIFY operation. The the COMPOUND procedure that contains the COPY_NOTIFY operation. The
user principal is identified by the RPC credentials. A mechanism user principal is identified by the RPC credentials. A mechanism
skipping to change at page 21, line 16 skipping to change at page 22, line 19
struct copy_from_auth_priv { struct copy_from_auth_priv {
secret4 cfap_shared_secret; secret4 cfap_shared_secret;
netloc4 cfap_destination; netloc4 cfap_destination;
/* the NFSv4 user name that the user principal maps to */ /* the NFSv4 user name that the user principal maps to */
utf8str_mixed cfap_username; utf8str_mixed cfap_username;
}; };
<CODE ENDS> <CODE ENDS>
cfp_shared_secret is an automatically generated random number cfap_shared_secret is an automatically generated random number
secret value. secret value.
copy_to_auth: A user principal is authorizing a destination copy_to_auth: A user principal is authorizing a destination
principal ("nfs@<destination>") to setup a copy_confirm_auth principal ("nfs@<destination>") to setup a copy_confirm_auth
privilege with a source principal ("nfs@<source>") to allow it to privilege with a source principal ("nfs@<source>") to allow it to
copy a file from the source to the destination on behalf of the copy a file from the source to the destination on behalf of the
user principal. This privilege is established on the destination user principal. This privilege is established on the destination
server before the user principal sends a COPY operation to the server before the user principal sends a COPY operation to the
destination server, and the resultant RPCSEC_GSSv3 context is used destination server, and the resultant RPCSEC_GSSv3 context is used
to secure the COPY operation. to secure the COPY operation.
skipping to change at page 22, line 36 skipping to change at page 23, line 40
privilege will use the context established between the user privilege will use the context established between the user
principal and the destination server used to OPEN the destination principal and the destination server used to OPEN the destination
file as the RPCSEC_GSSv3 parent handle. file as the RPCSEC_GSSv3 parent handle.
o A random number is generated to use as a secret to be shared o A random number is generated to use as a secret to be shared
between the two servers. This shared secret will be placed in the between the two servers. This shared secret will be placed in the
cfap_shared_secret and ctap_shared_secret fields of the cfap_shared_secret and ctap_shared_secret fields of the
appropriate privilege data types, copy_from_auth_priv and appropriate privilege data types, copy_from_auth_priv and
copy_to_auth_priv. Because of this shared_secret the copy_to_auth_priv. Because of this shared_secret the
RPCSEC_GSS3_CREATE control messages for copy_from_auth and RPCSEC_GSS3_CREATE control messages for copy_from_auth and
copy_to_auth MUST use a QOP of rpc_gss_svc_privacy. copy_to_auth MUST use a Quality of Protection (QOP) of
rpc_gss_svc_privacy.
o An instance of copy_from_auth_priv is filled in with the shared o An instance of copy_from_auth_priv is filled in with the shared
secret, the destination server, and the NFSv4 user id of the user secret, the destination server, and the NFSv4 user id of the user
principal and is placed in rpc_gss3_create_args principal and is placed in rpc_gss3_create_args
assertions[0].privs.privilege. The string "copy_from_auth" is assertions[0].privs.privilege. The string "copy_from_auth" is
placed in assertions[0].privs.name. The source server unwraps the placed in assertions[0].privs.name. The source server unwraps the
rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload and verifies that rpc_gss_svc_privacy RPCSEC_GSS3_CREATE payload and verifies that
the NFSv4 user id being asserted matches the source server's the NFSv4 user id being asserted matches the source server's
mapping of the user principal. If it does, the privilege is mapping of the user principal. If it does, the privilege is
established on the source server as: <"copy_from_auth", user id, established on the source server as: <"copy_from_auth", user id,
skipping to change at page 24, line 45 skipping to change at page 26, line 6
established "copy_from_auth" privilege, and verifies the that the established "copy_from_auth" privilege, and verifies the that the
ccap_username equals the cfap_username. ccap_username equals the cfap_username.
o If all verification succeeds, the "copy_confirm_auth" privilege is o If all verification succeeds, the "copy_confirm_auth" privilege is
established on the source server as < "copy_confirm_auth", established on the source server as < "copy_confirm_auth",
shared_secret_mic, user id> Because the shared secret has been shared_secret_mic, user id> Because the shared secret has been
verified, the resultant copy_confirm_auth RPCSEC_GSSv3 child verified, the resultant copy_confirm_auth RPCSEC_GSSv3 child
handle is noted to be acting on behalf of the user principal. handle is noted to be acting on behalf of the user principal.
o If the source server fails to verify the copy_from_auth privilege o If the source server fails to verify the copy_from_auth privilege
the COPY operation will be rejected with NFS4ERR_PARTNER_NO_AUTH, the COPY_NOTIFY operation will be rejected with
causing in turn the client to destroy the associated NFS4ERR_PARTNER_NO_AUTH.
copy_from_auth and copy_to_auth RPCSEC_GSSv3 structured privilege
assertion handles. o If the destination server fails to verify the copy_to_auth or
copy_confirm_auth privilege, the COPY will be rejeced with
NFS4ERR_PARTNER_NO_AUTH, causing the client to destroy the
associated copy_from_auth and copy_to_auth RPCSEC_GSSv3 structured
privilege assertion handles.
o All subsequent ONC RPC READ requests sent from the destination to o All subsequent ONC RPC READ requests sent from the destination to
copy data from the source to the destination will use the copy data from the source to the destination will use the
RPCSEC_GSSv3 copy_confirm_auth child handle. RPCSEC_GSSv3 copy_confirm_auth child handle.
Note that the use of the "copy_confirm_auth" privilege accomplishes Note that the use of the "copy_confirm_auth" privilege accomplishes
the following: the following:
o If a protocol like NFS is being used, with export policies, export o If a protocol like NFS is being used, with export policies, export
policies can be overridden in case the destination server as-an- policies can be overridden in case the destination server as-an-
skipping to change at page 26, line 37 skipping to change at page 27, line 50
minimal level of protection for the server-to-server copy protocol is minimal level of protection for the server-to-server copy protocol is
possible. possible.
In the absence of a strong security mechanism designed for the In the absence of a strong security mechanism designed for the
purpose, the challenge is how the source server and destination purpose, the challenge is how the source server and destination
server identify themselves to each other, especially in the presence server identify themselves to each other, especially in the presence
of multi-homed source and destination servers. In a multi-homed of multi-homed source and destination servers. In a multi-homed
environment, the destination server might not contact the source environment, the destination server might not contact the source
server from the same network address specified by the client in the server from the same network address specified by the client in the
COPY_NOTIFY. The cnr_stateid returned from the COPY_NOTIFY can be COPY_NOTIFY. The cnr_stateid returned from the COPY_NOTIFY can be
used to uiniquely identify the destination server to the source used to uniquely identify the destination server to the source
server. The use of cnr_stateid provides initial authentication of server. The use of cnr_stateid provides initial authentication of
the destination server, but cannot defend against man-in-the-middle the destination server, but cannot defend against man-in-the-middle
attacks after authentication or an eavesdropper that observes the attacks after authentication or an eavesdropper that observes the
opaque stateid on the wire. Other secure communication techniques opaque stateid on the wire. Other secure communication techniques
(e.g., IPsec) are necessary to block these attacks. (e.g., IPsec) are necessary to block these attacks.
Servers SHOULD reject COPY_NOTIFY requests that do not use RPCSEC_GSS Servers SHOULD reject COPY_NOTIFY requests that do not use RPCSEC_GSS
with privacy, thus ensuring the cnr_stateid in the COPY_NOTIFY reply with privacy, thus ensuring the cnr_stateid in the COPY_NOTIFY reply
is encrypted. For the same reason, clients SHOULD send COPY requests is encrypted. For the same reason, clients SHOULD send COPY requests
to the destination using RPCSEC_GSS with privacy. to the destination using RPCSEC_GSS with privacy.
4.10.1.3. Inter-Server Copy without ONC RPC 4.10.1.3. Inter-Server Copy without ONC RPC
The same techniques as Section 4.10.1.2, using unique URLs for each The same techniques as Section 4.10.1.2, using unique URLs for each
destination server, can be used for other protocols (e.g., HTTP destination server, can be used for other protocols (e.g., HTTP
[RFC2616] and FTP [RFC959]) as well. [RFC7230] and FTP [RFC959]) as well.
5. Support for Application IO Hints 5. Support for Application I/O Hints
Applications can issue client I/O hints via posix_fadvise() Applications can issue client I/O hints via posix_fadvise()
[posix_fadvise] to the NFS client. While this can help the NFS [posix_fadvise] to the NFS client. While this can help the NFS
client optimize I/O and caching for a file, it does not allow the NFS client optimize I/O and caching for a file, it does not allow the NFS
server and its exported file system to do likewise. We add an server and its exported file system to do likewise. We add an
IO_ADVISE procedure (Section 15.5) to communicate the client file IO_ADVISE procedure (Section 15.5) to communicate the client file
access patterns to the NFS server. The NFS server upon receiving a access patterns to the NFS server. The NFS server upon receiving a
IO_ADVISE operation MAY choose to alter its I/O and caching behavior, IO_ADVISE operation MAY choose to alter its I/O and caching behavior,
but is under no obligation to do so. but is under no obligation to do so.
Application specific NFS clients such as those used by hypervisors Application specific NFS clients such as those used by hypervisors
and databases can also leverage application hints to communicate and databases can also leverage application hints to communicate
their specialized requirements. their specialized requirements.
6. Sparse Files 6. Sparse Files
6.1. Introduction
A sparse file is a common way of representing a large file without A sparse file is a common way of representing a large file without
having to utilize all of the disk space for it. Consequently, a having to utilize all of the disk space for it. Consequently, a
sparse file uses less physical space than its size indicates. This sparse file uses less physical space than its size indicates. This
means the file contains 'holes', byte ranges within the file that means the file contains 'holes', byte ranges within the file that
contain no data. Most modern file systems support sparse files, contain no data. Most modern file systems support sparse files,
including most UNIX file systems and NTFS, but notably not Apple's including most UNIX file systems and NTFS, but notably not Apple's
HFS+. Common examples of sparse files include Virtual Machine (VM) HFS+. Common examples of sparse files include Virtual Machine (VM)
OS/disk images, database files, log files, and even checkpoint OS/disk images, database files, log files, and even checkpoint
recovery files most commonly used by the HPC community. recovery files most commonly used by the HPC community.
In addition many modern file systems support the concept of In addition many modern file systems support the concept of
'unwritten' or 'uninitialized' blocks, which have uninitialized space 'unwritten' or 'uninitialized' blocks, which have uninitialized space
allocated to them on disk, but will return zeros until data is allocated to them on disk, but will return zeros until data is
written to them. Such functionality is already present in the data written to them. Such functionality is already present in the data
model of the pNFS Block/Volume Layout (see [RFC5663]). Uninitialized model of the pNFS Block/Volume Layout (see [RFC5663]). Uninitialized
blocks can thought as holes inside a space reservation window. blocks can be thought of as holes inside a space reservation window.
If an application reads a hole in a sparse file, the file system must If an application reads a hole in a sparse file, the file system must
return all zeros to the application. For local data access there is return all zeros to the application. For local data access there is
little penalty, but with NFS these zeroes must be transferred back to little penalty, but with NFS these zeroes must be transferred back to
the client. If an application uses the NFS client to read data into the client. If an application uses the NFS client to read data into
memory, this wastes time and bandwidth as the application waits for memory, this wastes time and bandwidth as the application waits for
the zeroes to be transferred. the zeroes to be transferred.
A sparse file is typically created by initializing the file to be all A sparse file is typically created by initializing the file to be all
zeros - nothing is written to the data in the file, instead the hole zeros - nothing is written to the data in the file, instead the hole
is recorded in the metadata for the file. So a 8G disk image might is recorded in the metadata for the file. So a 8G disk image might
be represented initially by a couple hundred bits in the inode and be represented initially by a few hundred bits in the metadata (on
nothing on the disk. If the VM then writes 100M to a file in the UNIX file systems, the inode) and nothing on the disk. If the VM
middle of the image, there would now be two holes represented in the then writes 100M to a file in the middle of the image, there would
metadata and 100M in the data. now be two holes represented in the metadata and 100M in the data.
No new operation is needed to allow the creation of a sparsely No new operation is needed to allow the creation of a sparsely
populated file, when a file is created and a write occurs past the populated file, when a file is created and a write occurs past the
current size of the file, the non-allocated region will either be a current size of the file, the non-allocated region will either be a
hole or filled with zeros. The choice of behavior is dictated by the hole or filled with zeros. The choice of behavior is dictated by the
underlying file system and is transparent to the application. What underlying file system and is transparent to the application. What
is needed are the abilities to read sparse files and to punch holes is needed are the abilities to read sparse files and to punch holes
to reinitialize the contents of a file. to reinitialize the contents of a file.
Two new operations DEALLOCATE (Section 15.4) and READ_PLUS Two new operations DEALLOCATE (Section 15.4) and READ_PLUS
(Section 15.10) are introduced. DEALLOCATE allows for the hole (Section 15.10) are introduced. DEALLOCATE allows for the hole
punching. I.e., an application might want to reset the allocation punching, where an application might want to reset the allocation and
and reservation status of a range of the file. READ_PLUS supports reservation status of a range of the file. READ_PLUS supports all
all the features of READ but includes an extension to support sparse the features of READ but includes an extension to support sparse
files. READ_PLUS is guaranteed to perform no worse than READ, and files. READ_PLUS is guaranteed to perform no worse than READ, and
can dramatically improve performance with sparse files. READ_PLUS can dramatically improve performance with sparse files. READ_PLUS
does not depend on pNFS protocol features, but can be used by pNFS to does not depend on pNFS protocol features, but can be used by pNFS to
support sparse files. support sparse files.
6.2. Terminology 6.1. Terminology
Regular file: An object of file type NF4REG or NF4NAMEDATTR. Regular file: An object of file type NF4REG or NF4NAMEDATTR.
Sparse file: A Regular file that contains one or more holes. Sparse file: A Regular file that contains one or more holes.
Hole: A byte range within a Sparse file that contains regions of all Hole: A byte range within a Sparse file that contains regions of all
zeroes. A hole might or might not have space allocated or zeroes. A hole might or might not have space allocated or
reserved to it. reserved to it.
6.3. New Operations 6.2. New Operations
6.3.1. READ_PLUS 6.2.1. READ_PLUS
READ_PLUS is a new variant of the NFSv4.1 READ operation [RFC5661]. READ_PLUS is a new variant of the NFSv4.1 READ operation [RFC5661].
Besides being able to support all of the data semantics of the READ Besides being able to support all of the data semantics of the READ
operation, it can also be used by the client and server to operation, it can also be used by the client and server to
efficiently transfer holes. Note that as the client has no a priori efficiently transfer holes. Note that as the client has no a priori
knowledge of whether a hole is present or not, if the client supports knowledge of whether a hole is present or not, if the client supports
READ_PLUS and so does the server, then it should always use the READ_PLUS and so does the server, then it should always use the
READ_PLUS operation in preference to the READ operation. READ_PLUS operation in preference to the READ operation.
READ_PLUS extends the response with a new arm representing holes to READ_PLUS extends the response with a new arm representing holes to
skipping to change at page 29, line 20 skipping to change at page 30, line 32
When a client sends a READ operation, it is not prepared to accept a When a client sends a READ operation, it is not prepared to accept a
READ_PLUS-style response providing a compact encoding of the scope of READ_PLUS-style response providing a compact encoding of the scope of
holes. If a READ occurs on a sparse file, then the server must holes. If a READ occurs on a sparse file, then the server must
expand such data to be raw bytes. If a READ occurs in the middle of expand such data to be raw bytes. If a READ occurs in the middle of
a hole, the server can only send back bytes starting from that a hole, the server can only send back bytes starting from that
offset. By contrast, if a READ_PLUS occurs in the middle of a hole, offset. By contrast, if a READ_PLUS occurs in the middle of a hole,
the server can send back a range which starts before the offset and the server can send back a range which starts before the offset and
extends past the range. extends past the range.
6.3.2. DEALLOCATE 6.2.2. DEALLOCATE
DEALLOCATE can be used to hole punch, which allows the client to DEALLOCATE can be used to hole punch, which allows the client to
avoid the transfer of a repetitive pattern of zeros across the avoid the transfer of a repetitive pattern of zeros across the
network. network.
7. Space Reservation 7. Space Reservation
Applications want to be able to reserve space for a file, report the Applications want to be able to reserve space for a file, report the
amount of actual disk space a file occupies, and free-up the backing amount of actual disk space a file occupies, and free-up the backing
space of a file when it is not required. space of a file when it is not required.
One example is the posix_fallocate ([posix_fallocate]) which allows One example is the posix_fallocate operation ([posix_fallocate])
applications to ask for space reservations from the operating system, which allows applications to ask for space reservations from the
usually to provide a better file layout and reduce overhead for operating system, usually to provide a better file layout and reduce
random or slow growing file appending workloads. overhead for random or slow growing file appending workloads.
Another example is space reservation for virtual disks in a Another example is space reservation for virtual disks in a
hypervisor. In virtualized environments, virtual disk files are hypervisor. In virtualized environments, virtual disk files are
often stored on NFS mounted volumes. When a hypervisor creates a often stored on NFS mounted volumes. When a hypervisor creates a
virtual disk file, it often tries to preallocate the space for the virtual disk file, it often tries to preallocate the space for the
file so that there are no future allocation related errors during the file so that there are no future allocation related errors during the
operation of the virtual machine. Such errors prevent a virtual operation of the virtual machine. Such errors prevent a virtual
machine from continuing execution and result in downtime. machine from continuing execution and result in downtime.
Currently, in order to achieve such a guarantee, applications zero Currently, in order to achieve such a guarantee, applications zero
skipping to change at page 30, line 21 skipping to change at page 31, line 32
that would be freed when a file is deleted. Currently, NFS reports that would be freed when a file is deleted. Currently, NFS reports
two size attributes: two size attributes:
size The logical file size of the file. size The logical file size of the file.
space_used The size in bytes that the file occupies on disk space_used The size in bytes that the file occupies on disk
While these attributes are sufficient for space accounting in While these attributes are sufficient for space accounting in
traditional file systems, they prove to be inadequate in modern file traditional file systems, they prove to be inadequate in modern file
systems that support block sharing. In such file systems, multiple systems that support block sharing. In such file systems, multiple
inodes can point to a single block with a block reference count to inodes (the metadata portion of the file system object) can point to
guard against premature freeing. Having a way to tell the number of a single block with a block reference count to guard against
blocks that would be freed if the file was deleted would be useful to premature freeing. Having a way to tell the number of blocks that
would be freed if the file was deleted would be useful to
applications that wish to migrate files when a volume is low on applications that wish to migrate files when a volume is low on
space. space.
Since virtual disks represent a hard drive in a virtual machine, a Since virtual disks represent a hard drive in a virtual machine, a
virtual disk can be viewed as a file system within a file. Since not virtual disk can be viewed as a file system within a file. Since not
all blocks within a file system are in use, there is an opportunity all blocks within a file system are in use, there is an opportunity
to reclaim blocks that are no longer in use. A call to deallocate to reclaim blocks that are no longer in use. A call to deallocate
blocks could result in better space efficiency. Lesser space MAY be blocks could result in better space efficiency. Lesser space MAY be
consumed for backups after block deallocation. consumed for backups after block deallocation.
The following operations and attributes can be used to resolve these The following operations and attributes can be used to resolve these
issues: issues:
space_freed This attribute specifies the space freed when a file is space_freed This attribute reports the space that would be freed
deleted, taking block sharing into consideration. when a file is deleted, taking block sharing into consideration.
DEALLOCATE This operation delallocates the blocks backing a region DEALLOCATE This operation deallocates the blocks backing a region of
of the file. the file.
If space_used of a file is interpreted to mean the size in bytes of If space_used of a file is interpreted to mean the size in bytes of
all disk blocks pointed to by the inode of the file, then shared all disk blocks pointed to by the inode of the file, then shared
blocks get double counted, over-reporting the space utilization. blocks get double counted, over-reporting the space utilization.
This also has the adverse effect that the deletion of a file with This also has the adverse effect that the deletion of a file with
shared blocks frees up less than space_used bytes. shared blocks frees up less than space_used bytes.
On the other hand, if space_used is interpreted to mean the size in On the other hand, if space_used is interpreted to mean the size in
bytes of those disk blocks unique to the inode of the file, then bytes of those disk blocks unique to the inode of the file, then
shared blocks are not counted in any file, resulting in under- shared blocks are not counted in any file, resulting in under-
skipping to change at page 31, line 45 skipping to change at page 33, line 11
The format of the header is application specific, but there are two The format of the header is application specific, but there are two
main components typically encountered: main components typically encountered:
1. An Application Data Block Number (ADBN) which allows the 1. An Application Data Block Number (ADBN) which allows the
application to determine which data block is being referenced. application to determine which data block is being referenced.
This is useful when the client is not storing the blocks in This is useful when the client is not storing the blocks in
contiguous memory, i.e., a logical block number. contiguous memory, i.e., a logical block number.
2. Fields to describe the state of the ADB and a means to detect 2. Fields to describe the state of the ADB and a means to detect
block corruption. For both pieces of data, a useful property is block corruption. For both pieces of data, a useful property
that allowed values be unique in that if passed across the would be that the allowed values are specially selected so that
network, corruption due to translation between big and little if passed across the network, corruption due to translation
endian architectures are detectable. For example, 0xF0DEDEF0 has between big and little endian architectures is detectable. For
the same bit pattern in both architectures. example, 0xF0DEDEF0 has the same (32 wide) bit pattern in both
architectures, making it inappropriate.
Applications already impose structures on files [Strohm11] and detect Applications already impose structures on files [Strohm11] and detect
corruption in data blocks [Ashdown08]. What they are not able to do corruption in data blocks [Ashdown08]. What they are not able to do
is efficiently transfer and store ADBs. To initialize a file with is efficiently transfer and store ADBs. To initialize a file with
ADBs, the client must send each full ADB to the server and that must ADBs, the client must send each full ADB to the server and that must
be stored on the server. be stored on the server.
In this section, we define a framework for transferring the ADB from In this section, we define a framework for transferring the ADB from
client to server and present one approach to detecting corruption in client to server and present one approach to detecting corruption in
a given ADB implementation. a given ADB implementation.
skipping to change at page 35, line 5 skipping to change at page 36, line 19
The hypothetical application presented in Section 8.2 can be used to The hypothetical application presented in Section 8.2 can be used to
illustrate how READ_PLUS would return an array of results. A file is illustrate how READ_PLUS would return an array of results. A file is
created and initialized with 100 4k ADBs in the FREE state with the created and initialized with 100 4k ADBs in the FREE state with the
WRITE_SAME operation (see Section 15.12): WRITE_SAME operation (see Section 15.12):
WRITE_SAME {0, 4k, 100, 0, 0, 8, 0xfeedface} WRITE_SAME {0, 4k, 100, 0, 0, 8, 0xfeedface}
Further, assume the application writes a single ADB at 16k, changing Further, assume the application writes a single ADB at 16k, changing
the guard pattern to 0xcafedead, we would then have in memory: the guard pattern to 0xcafedead, we would then have in memory:
0k -> (4k - 1) : 00 00 00 00 fe ed fa ce 00 00 ... 00 00 0k -> (4k - 1) : 00 00 00 00 ... fe ed fa ce 00 00 ... 00
4k -> (8k - 1) : 00 00 00 01 fe ed fa ce 00 00 ... 00 00 4k -> (8k - 1) : 00 00 00 01 ... fe ed fa ce 00 00 ... 00
8k -> (12k - 1) : 00 00 00 02 fe ed fa ce 00 00 ... 00 00 8k -> (12k - 1) : 00 00 00 02 ... fe ed fa ce 00 00 ... 00
12k -> (16k - 1) : 00 00 00 03 fe ed fa ce 00 00 ... 00 00 12k -> (16k - 1) : 00 00 00 03 ... fe ed fa ce 00 00 ... 00
16k -> (20k - 1) : 00 00 00 04 ca fe de ad 00 00 ... 00 00 16k -> (20k - 1) : 00 00 00 04 ... ca fe de ad 00 00 ... 00
20k -> (24k - 1) : 00 00 00 05 fe ed fa ce 00 00 ... 00 00 20k -> (24k - 1) : 00 00 00 05 ... fe ed fa ce 00 00 ... 00
24k -> (28k - 1) : 00 00 00 06 fe ed fa ce 00 00 ... 00 00 24k -> (28k - 1) : 00 00 00 06 ... fe ed fa ce 00 00 ... 00
... ...
396k -> (400k - 1) : 00 00 00 63 fe ed fa ce 00 00 ... 00 00 396k -> (400k - 1) : 00 00 00 63 ... fe ed fa ce 00 00 ... 00
And when the client did a READ_PLUS of 64k at the start of the file, And when the client did a READ_PLUS of 64k at the start of the file,
it could get back a result of data: it could get back a result of data:
0k -> (4k - 1) : 00 00 00 00 fe ed fa ce 00 00 ... 00 00 0k -> (4k - 1) : 00 00 00 00 ... fe ed fa ce 00 00 ... 00
4k -> (8k - 1) : 00 00 00 01 fe ed fa ce 00 00 ... 00 00 4k -> (8k - 1) : 00 00 00 01 ... fe ed fa ce 00 00 ... 00
8k -> (12k - 1) : 00 00 00 02 fe ed fa ce 00 00 ... 00 00 8k -> (12k - 1) : 00 00 00 02 ... fe ed fa ce 00 00 ... 00
12k -> (16k - 1) : 00 00 00 03 fe ed fa ce 00 00 ... 00 00 12k -> (16k - 1) : 00 00 00 03 ... fe ed fa ce 00 00 ... 00
16k -> (20k - 1) : 00 00 00 04 ca fe de ad 00 00 ... 00 00 16k -> (20k - 1) : 00 00 00 04 ... ca fe de ad 00 00 ... 00
20k -> (24k - 1) : 00 00 00 05 fe ed fa ce 00 00 ... 00 00 20k -> (24k - 1) : 00 00 00 05 ... fe ed fa ce 00 00 ... 00
24k -> (24k - 1) : 00 00 00 06 fe ed fa ce 00 00 ... 00 00 24k -> (24k - 1) : 00 00 00 06 ... fe ed fa ce 00 00 ... 00
... ...
62k -> (64k - 1) : 00 00 00 15 fe ed fa ce 00 00 ... 00 00 62k -> (64k - 1) : 00 00 00 15 ... fe ed fa ce 00 00 ... 00
8.4. An Example of Zeroing Space 8.4. An Example of Zeroing Space
A simpler use case for WRITE_SAME are applications that want to A simpler use case for WRITE_SAME are applications that want to
efficiently zero out a file, but do not want to modify space efficiently zero out a file, but do not want to modify space
reservations. This can easily be achieved by a call to WRITE_SAME reservations. This can easily be achieved by a call to WRITE_SAME
without a ADB block numbers and pattern, e.g.: without a ADB block numbers and pattern, e.g.:
WRITE_SAME {0, 1k, 10000, 0, 0, 0, 0} WRITE_SAME {0, 1k, 10000, 0, 0, 0, 0}
9. Labeled NFS 9. Labeled NFS
9.1. Introduction
Access control models such as Unix permissions or Access Control Access control models such as Unix permissions or Access Control
Lists are commonly referred to as Discretionary Access Control (DAC) Lists are commonly referred to as Discretionary Access Control (DAC)
models. These systems base their access decisions on user identity models. These systems base their access decisions on user identity
and resource ownership. In contrast Mandatory Access Control (MAC) and resource ownership. In contrast Mandatory Access Control (MAC)
models base their access control decisions on the label on the models base their access control decisions on the label on the
subject (usually a process) and the object it wishes to access subject (usually a process) and the object it wishes to access
[RFC7204]. These labels may contain user identity information but [RFC4949]. These labels may contain user identity information but
usually contain additional information. In DAC systems users are usually contain additional information. In DAC systems users are
free to specify the access rules for resources that they own. MAC free to specify the access rules for resources that they own. MAC
models base their security decisions on a system wide policy models base their security decisions on a system wide policy
established by an administrator or organization which the users do established by an administrator or organization which the users do
not have the ability to override. In this section, we add a MAC not have the ability to override. In this section, we add a MAC
model to NFSv4.2. model to NFSv4.2.
First we provide a method for transporting and storing security label First we provide a method for transporting and storing security label
data on NFSv4 file objects. Security labels have several semantics data on NFSv4 file objects. Security labels have several semantics
that are met by NFSv4 recommended attributes such as the ability to that are met by NFSv4 recommended attributes such as the ability to
set the label value upon object creation. Access control on these set the label value upon object creation. Access control on these
attributes are done through a combination of two mechanisms. As with attributes are done through a combination of two mechanisms. As with
other recommended attributes on file objects the usual DAC checks other recommended attributes on file objects the usual DAC checks,
(ACLs and permission bits) will be performed to ensure that proper Access Control Lists (ACLs) and permission bits, will be performed to
file ownership is enforced. In addition a MAC system MAY be employed ensure that proper file ownership is enforced. In addition a MAC
on the client, server, or both to enforce additional policy on what system MAY be employed on the client, server, or both to enforce
subjects may modify security label information. additional policy on what subjects may modify security label
information.
Second, we describe a method for the client to determine if an NFSv4 Second, we describe a method for the client to determine if an NFSv4
file object security label has changed. A client which needs to know file object security label has changed. A client which needs to know
if a label on a file or set of files is going to change SHOULD if a label on a file or set of files is going to change SHOULD
request a delegation on each labeled file. In order to change such a request a delegation on each labeled file. In order to change such a
security label, the server will have to recall delegations on any security label, the server will have to recall delegations on any
file affected by the label change, so informing clients of the label file affected by the label change, so informing clients of the label
change. change.
An additional useful feature would be modification to the RPC layer An additional useful feature would be modification to the RPC layer
used by NFSv4 to allow RPC calls to carry security labels and enable used by NFSv4 to allow RPC calls to assert client process subject
full mode enforcement as described in Section 9.6.1. Such security labels and enable full mode enforcement as described in
modifications are outside the scope of this document (see Section 9.5.1. Such modifications are outside the scope of this
[rpcsec_gssv3]). document (see [rpcsec_gssv3]).
9.2. Definitions 9.1. Definitions
Label Format Specifier (LFS): is an identifier used by the client to Label Format Specifier (LFS): is an identifier used by the client to
establish the syntactic format of the security label and the establish the syntactic format of the security label and the
semantic meaning of its components. These specifiers exist in a semantic meaning of its components. These specifiers exist in a
registry associated with documents describing the format and registry associated with documents describing the format and
semantics of the label. semantics of the label.
Label Format Registry: is the IANA registry (see [Quigley14]) Label Format Registry: is the IANA registry (see [RFC7569])
containing all registered LFSes along with references to the containing all registered LFSes along with references to the
documents that describe the syntactic format and semantics of the documents that describe the syntactic format and semantics of the
security label. security label.
Policy Identifier (PI): is an optional part of the definition of a Policy Identifier (PI): is an optional part of the definition of a
Label Format Specifier which allows for clients and server to Label Format Specifier which allows for clients and server to
identify specific security policies. identify specific security policies.
Object: is a passive resource within the system that we wish to be Object: is a passive resource within the system that we wish to be
protected. Objects can be entities such as files, directories, protected. Objects can be entities such as files, directories,
skipping to change at page 37, line 20 skipping to change at page 38, line 34
MAC-Aware: is a server which can transmit and store object labels. MAC-Aware: is a server which can transmit and store object labels.
MAC-Functional: is a client or server which is Labeled NFS enabled. MAC-Functional: is a client or server which is Labeled NFS enabled.
Such a system can interpret labels and apply policies based on the Such a system can interpret labels and apply policies based on the
security system. security system.
Multi-Level Security (MLS): is a traditional model where objects are Multi-Level Security (MLS): is a traditional model where objects are
given a sensitivity level (Unclassified, Secret, Top Secret, etc) given a sensitivity level (Unclassified, Secret, Top Secret, etc)
and a category set (see [BL73], [RFC1108], and [RFC2401]). and a category set (see [BL73], [RFC1108], and [RFC2401]).
9.3. MAC Security Attribute 9.2. MAC Security Attribute
MAC models base access decisions on security attributes bound to MAC models base access decisions on security attributes bound to
subjects and objects. This information can range from a user subjects (usually processes) and objects (for NFS, file objects).
identity for an identity based MAC model, sensitivity levels for This information can range from a user identity for an identity based
Multi-level security, or a type for Type Enforcement. These models MAC model, sensitivity levels for Multi-level security, or a type for
base their decisions on different criteria but the semantics of the Type Enforcement. These models base their decisions on different
security attribute remain the same. The semantics required by the criteria but the semantics of the security attribute remain the same.
security attributes are listed below: The semantics required by the security attributes are listed below:
o MUST provide flexibility with respect to the MAC model. o MUST provide flexibility with respect to the MAC model.
o MUST provide the ability to atomically set security information o MUST provide the ability to atomically set security information
upon object creation. upon object creation.
o MUST provide the ability to enforce access control decisions both o MUST provide the ability to enforce access control decisions both
on the client and the server. on the client and the server.
o MUST NOT expose an object to either the client or server name o MUST NOT expose an object to either the client or server name
space before its security information has been bound to it. space before its security information has been bound to it.
NFSv4 implements the security attribute as a recommended attribute. NFSv4 implements the security attribute as a recommended attribute.
These attributes have a fixed format and semantics, which conflicts These attributes have a fixed format and semantics, which conflicts
with the flexible nature of the security attribute. To resolve this with the flexible nature of the security attribute. To resolve this
the security attribute consists of two components. The first the security attribute consists of two components. The first
component is a LFS as defined in [Quigley14] to allow for component is a LFS as defined in [RFC7569] to allow for
interoperability between MAC mechanisms. The second component is an interoperability between MAC mechanisms. The second component is an
opaque field which is the actual security attribute data. To allow opaque field which is the actual security attribute data. To allow
for various MAC models, NFSv4 should be used solely as a transport for various MAC models, NFSv4 should be used solely as a transport
mechanism for the security attribute. It is the responsibility of mechanism for the security attribute. It is the responsibility of
the endpoints to consume the security attribute and make access the endpoints to consume the security attribute and make access
decisions based on their respective models. In addition, creation of decisions based on their respective models. In addition, creation of
objects through OPEN and CREATE allows for the security attribute to objects through OPEN and CREATE allows for the security attribute to
be specified upon creation. By providing an atomic create and set be specified upon creation. By providing an atomic create and set
operation for the security attribute it is possible to enforce the operation for the security attribute it is possible to enforce the
second and fourth requirements. The recommended attribute second and fourth requirements. The recommended attribute
FATTR4_SEC_LABEL (see Section 12.2.4) will be used to satisfy this FATTR4_SEC_LABEL (see Section 12.2.4) will be used to satisfy this
requirement. requirement.
9.3.1. Delegations 9.2.1. Delegations
In the event that a security attribute is changed on the server while In the event that a security attribute is changed on the server while
a client holds a delegation on the file, both the server and the a client holds a delegation on the file, both the server and the
client MUST follow the NFSv4.1 protocol (see Chapter 10 of [RFC5661]) client MUST follow the NFSv4.1 protocol (see Chapter 10 of [RFC5661])
with respect to attribute changes. It SHOULD flush all changes back with respect to attribute changes. It SHOULD flush all changes back
to the server and relinquish the delegation. to the server and relinquish the delegation.
9.3.2. Permission Checking 9.2.2. Permission Checking
It is not feasible to enumerate all possible MAC models and even It is not feasible to enumerate all possible MAC models and even
levels of protection within a subset of these models. This means levels of protection within a subset of these models. This means
that the NFSv4 client and servers cannot be expected to directly make that the NFSv4 client and servers cannot be expected to directly make
access control decisions based on the security attribute. Instead access control decisions based on the security attribute. Instead
NFSv4 should defer permission checking on this attribute to the host NFSv4 should defer permission checking on this attribute to the host
system. These checks are performed in addition to existing DAC and system. These checks are performed in addition to existing DAC and
ACL checks outlined in the NFSv4 protocol. Section 9.6 gives a ACL checks outlined in the NFSv4 protocol. Section 9.5 gives a
specific example of how the security attribute is handled under a specific example of how the security attribute is handled under a
particular MAC model. particular MAC model.
9.3.3. Object Creation 9.2.3. Object Creation
When creating files in NFSv4 the OPEN and CREATE operations are used. When creating files in NFSv4 the OPEN and CREATE operations are used.
One of the parameters to these operations is an fattr4 structure One of the parameters to these operations is an fattr4 structure
containing the attributes the file is to be created with. This containing the attributes the file is to be created with. This
allows NFSv4 to atomically set the security attribute of files upon allows NFSv4 to atomically set the security attribute of files upon
creation. When a client is MAC-Functional it must always provide the creation. When a client is MAC-Functional it must always provide the
initial security attribute upon file creation. In the event that the initial security attribute upon file creation. In the event that the
server is MAC-Functional as well, it should determine by policy server is MAC-Functional as well, it should determine by policy
whether it will accept the attribute from the client or instead make whether it will accept the attribute from the client or instead make
the determination itself. If the client is not MAC-Functional, then the determination itself. If the client is not MAC-Functional, then
the MAC-Functional server must decide on a default label. A more in the MAC-Functional server must decide on a default label. A more in
depth explanation can be found in Section 9.6. depth explanation can be found in Section 9.5.
9.3.4. Existing Objects 9.2.4. Existing Objects
Note that under the MAC model, all objects must have labels. Note that under the MAC model, all objects must have labels.
Therefore, if an existing server is upgraded to include Labeled NFS Therefore, if an existing server is upgraded to include Labeled NFS
support, then it is the responsibility of the security system to support, then it is the responsibility of the security system to
define the behavior for existing objects. define the behavior for existing objects.
9.3.5. Label Changes 9.2.5. Label Changes
Consider a guest mode system (Section 9.6.2) in which the clients Consider a guest mode system (Section 9.5.2) in which the clients
enforce MAC checks and the server has only a DAC security system enforce MAC checks and the server has only a DAC security system
which stores the labels along with the file data. In this type of which stores the labels along with the file data. In this type of
system, a user with the appropriate DAC credentials on a client with system, a user with the appropriate DAC credentials on a client with
poorly configured or disabled MAC labeling enforcement is allowed poorly configured or disabled MAC labeling enforcement is allowed
access to the file label (and data) on the server and can change the access to the file label (and data) on the server and can change the
label. label.
Clients which need to know if a label on a file or set of files has Clients which need to know if a label on a file or set of files has
changed SHOULD request a delegation on each labeled file so that a changed SHOULD request a delegation on each labeled file so that a
label change by another client will be known via the process label change by another client will be known via the process
described in Section 9.3.1 which must be followed: the delegation described in Section 9.2.1 which must be followed: the delegation
will be recalled, which effectively notifies the client of the will be recalled, which effectively notifies the client of the
change. change.
Note that the MAC security policies on a client can be such that the Note that the MAC security policies on a client can be such that the
client does not have access to the file unless it has a delegation. client does not have access to the file unless it has a delegation.
9.4. pNFS Considerations 9.3. pNFS Considerations
The new FATTR4_SEC_LABEL attribute is metadata information and as The new FATTR4_SEC_LABEL attribute is metadata information and as
such the DS is not aware of the value contained on the MDS. such the storage device is not aware of the value contained on the
Fortunately, the NFSv4.1 protocol [RFC5661] already has provisions metadata server. Fortunately, the NFSv4.1 protocol [RFC5661] already
for doing access level checks from the DS to the MDS. In order for has provisions for doing access level checks from the storage device
the DS to validate the subject label presented by the client, it to the metadata server. In order for the storage device to validate
SHOULD utilize this mechanism. the subject label presented by the client, it SHOULD utilize this
mechanism.
9.5. Discovery of Server Labeled NFS Support 9.4. Discovery of Server Labeled NFS Support
The server can easily determine that a client supports Labeled NFS The server can easily determine that a client supports Labeled NFS
when it queries for the FATTR4_SEC_LABEL label for an object. The when it queries for the FATTR4_SEC_LABEL label for an object. The
client might need to discover which LFS the server supports. client might need to discover which LFS the server supports.
The following compound MUST NOT be denied by any MAC label check: The following compound MUST NOT be denied by any MAC label check:
PUTROOTFH, GETATTR {FATTR4_SEC_LABEL} PUTROOTFH, GETATTR {FATTR4_SEC_LABEL}
Note that the server might have imposed a security flavor on the root Note that the server might have imposed a security flavor on the root
that precludes such access. I.e., if the server requires kerberized that precludes such access. I.e., if the server requires kerberized
access and the client presents a compound with AUTH_SYS, then the access and the client presents a compound with AUTH_SYS, then the
server is allowed to return NFS4ERR_WRONGSEC in this case. But if server is allowed to return NFS4ERR_WRONGSEC in this case. But if
the client presents a correct security flavor, then the server MUST the client presents a correct security flavor, then the server MUST
return the FATTR4_SEC_LABEL attribute with the supported LFS filled return the FATTR4_SEC_LABEL attribute with the supported LFS filled
in. in.
9.6. MAC Security NFS Modes of Operation 9.5. MAC Security NFS Modes of Operation
A system using Labeled NFS may operate in two modes. The first mode A system using Labeled NFS may operate in two modes. The first mode
provides the most protection and is called "full mode". In this mode provides the most protection and is called "full mode". In this mode
both the client and server implement a MAC model allowing each end to both the client and server implement a MAC model allowing each end to
make an access control decision. The remaining mode is called the make an access control decision. The remaining mode is called the
"guest mode" and in this mode one end of the connection is not "guest mode" and in this mode one end of the connection is not
implementing a MAC model and thus offers less protection than full implementing a MAC model and thus offers less protection than full
mode. mode.
9.6.1. Full Mode 9.5.1. Full Mode
Full mode environments consist of MAC-Functional NFSv4 servers and Full mode environments consist of MAC-Functional NFSv4 servers and
clients and may be composed of mixed MAC models and policies. The clients and may be composed of mixed MAC models and policies. The
system requires that both the client and server have an opportunity system requires that both the client and server have an opportunity
to perform an access control check based on all relevant information to perform an access control check based on all relevant information
within the network. The file object security attribute is provided within the network. The file object security attribute is provided
using the mechanism described in Section 9.3. using the mechanism described in Section 9.2.
Fully MAC-Functional NFSv4 servers are not possible in the absence of Fully MAC-Functional NFSv4 servers are not possible in the absence of
RPCSEC_GSSv3 [rpcsec_gssv3] support for subject label transport. RPCSEC_GSSv3 [rpcsec_gssv3] support for client process subject label
However, servers may make decisions based on the RPC credential assertion. However, servers may make decisions based on the RPC
information available. credential information available.
9.6.1.1. Initial Labeling and Translation 9.5.1.1. Initial Labeling and Translation
The ability to create a file is an action that a MAC model may wish The ability to create a file is an action that a MAC model may wish
to mediate. The client is given the responsibility to determine the to mediate. The client is given the responsibility to determine the
initial security attribute to be placed on a file. This allows the initial security attribute to be placed on a file. This allows the
client to make a decision as to the acceptable security attributes to client to make a decision as to the acceptable security attributes to
create a file with before sending the request to the server. Once create a file with before sending the request to the server. Once
the server receives the creation request from the client it may the server receives the creation request from the client it may
choose to evaluate if the security attribute is acceptable. choose to evaluate if the security attribute is acceptable.
Security attributes on the client and server may vary based on MAC Security attributes on the client and server may vary based on MAC
skipping to change at page 41, line 5 skipping to change at page 42, line 21
identify the format and meaning of the opaque portion of the security identify the format and meaning of the opaque portion of the security
attribute. A full mode environment may contain hosts operating in attribute. A full mode environment may contain hosts operating in
several different LFSes. In this case a mechanism for translating several different LFSes. In this case a mechanism for translating
the opaque portion of the security attribute is needed. The actual the opaque portion of the security attribute is needed. The actual
translation function will vary based on MAC model and policy and is translation function will vary based on MAC model and policy and is
out of the scope of this document. If a translation is unavailable out of the scope of this document. If a translation is unavailable
for a given LFS then the request MUST be denied. Another recourse is for a given LFS then the request MUST be denied. Another recourse is
to allow the host to provide a fallback mapping for unknown security to allow the host to provide a fallback mapping for unknown security
attributes. attributes.
9.6.1.2. Policy Enforcement 9.5.1.2. Policy Enforcement
In full mode access control decisions are made by both the clients In full mode access control decisions are made by both the clients
and servers. When a client makes a request it takes the security and servers. When a client makes a request it takes the security
attribute from the requesting process and makes an access control attribute from the requesting process and makes an access control
decision based on that attribute and the security attribute of the decision based on that attribute and the security attribute of the
object it is trying to access. If the client denies that access an object it is trying to access. If the client denies that access an
RPC call to the server is never made. If however the access is RPC call to the server is never made. If however the access is
allowed the client will make a call to the NFS server. allowed the client will make a call to the NFS server.
When the server receives the request from the client it uses any When the server receives the request from the client it uses any
skipping to change at page 41, line 31 skipping to change at page 42, line 47
Future protocol extensions may also allow the server to factor into Future protocol extensions may also allow the server to factor into
the decision a security label extracted from the RPC request. the decision a security label extracted from the RPC request.
Implementations MAY validate security attributes supplied over the Implementations MAY validate security attributes supplied over the
network to ensure that they are within a set of attributes permitted network to ensure that they are within a set of attributes permitted
from a specific peer, and if not, reject them. Note that a system from a specific peer, and if not, reject them. Note that a system
may permit a different set of attributes to be accepted from each may permit a different set of attributes to be accepted from each
peer. peer.
9.6.1.3. Limited Server 9.5.1.3. Limited Server
A Limited Server mode (see Section 4.2 of [RFC7204]) consists of a A Limited Server mode (see Section 4.2 of [RFC7204]) consists of a
server which is label aware, but does not enforce policies. Such a server which is label aware, but does not enforce policies. Such a
server will store and retrieve all object labels presented by server will store and retrieve all object labels presented by
clients, utilize the methods described in Section 9.3.5 to allow the clients, utilize the methods described in Section 9.2.5 to allow the
clients to detect changing labels, but may not factor the label into clients to detect changing labels, but may not factor the label into
access decisions. Instead, it will expect the clients to enforce all access decisions. Instead, it will expect the clients to enforce all
such access locally. such access locally.
9.6.2. Guest Mode 9.5.2. Guest Mode
Guest mode implies that either the client or the server does not Guest mode implies that either the client or the server does not
handle labels. If the client is not Labeled NFS aware, then it will handle labels. If the client is not Labeled NFS aware, then it will
not offer subject labels to the server. The server is the only not offer subject labels to the server. The server is the only
entity enforcing policy, and may selectively provide standard NFS entity enforcing policy, and may selectively provide standard NFS
services to clients based on their authentication credentials and/or services to clients based on their authentication credentials and/or
associated network attributes (e.g., IP address, network interface). associated network attributes (e.g., IP address, network interface).
The level of trust and access extended to a client in this mode is The level of trust and access extended to a client in this mode is
configuration-specific. If the server is not Labeled NFS aware, then configuration-specific. If the server is not Labeled NFS aware, then
it will not return object labels to the client. Clients in this it will not return object labels to the client. Clients in this
environment are may consist of groups implementing different MAC environment are may consist of groups implementing different MAC
model policies. The system requires that all clients in the model policies. The system requires that all clients in the
environment be responsible for access control checks. environment be responsible for access control checks.
9.7. Security Considerations for Labeled NFS 9.6. Security Considerations for Labeled NFS
This entire chapter deals with security issues. This entire chapter deals with security issues.
Depending on the level of protection the MAC system offers there may Depending on the level of protection the MAC system offers there may
be a requirement to tightly bind the security attribute to the data. be a requirement to tightly bind the security attribute to the data.
When only one of the client or server enforces labels, it is When only one of the client or server enforces labels, it is
important to realize that the other side is not enforcing MAC important to realize that the other side is not enforcing MAC
protections. Alternate methods might be in use to handle the lack of protections. Alternate methods might be in use to handle the lack of
MAC support and care should be taken to identify and mitigate threats MAC support and care should be taken to identify and mitigate threats
skipping to change at page 44, line 28 skipping to change at page 45, line 44
fall back to the normal copy semantics. fall back to the normal copy semantics.
11.1.2.3. NFS4ERR_PARTNER_NO_AUTH (Error Code 10089) 11.1.2.3. NFS4ERR_PARTNER_NO_AUTH (Error Code 10089)
The source server does not authorize a server-to-server copy offload The source server does not authorize a server-to-server copy offload
operation. This may be due to the client's failure to send the operation. This may be due to the client's failure to send the
COPY_NOTIFY operation to the source server, the source server COPY_NOTIFY operation to the source server, the source server
receiving a server-to-server copy offload request after the copy receiving a server-to-server copy offload request after the copy
lease time expired, or for some other permission problem. lease time expired, or for some other permission problem.
The destination server does not authorize a server-to-server copy
offload operation. This may be due to an inter-server COPY request
where the destination server requires RPCSEC_GSSv3 and it is not
used, or some other permissions problem.
11.1.2.4. NFS4ERR_PARTNER_NOTSUPP (Error Code 10088) 11.1.2.4. NFS4ERR_PARTNER_NOTSUPP (Error Code 10088)
The remote server does not support the server-to-server copy offload The remote server does not support the server-to-server copy offload
protocol. protocol.
11.1.3. Labeled NFS Errors 11.1.3. Labeled NFS Errors
These errors are used in Labeled NFS. These errors are used in Labeled NFS.
11.1.3.1. NFS4ERR_BADLABEL (Error Code 10093) 11.1.3.1. NFS4ERR_BADLABEL (Error Code 10093)
skipping to change at page 45, line 26 skipping to change at page 46, line 52
| | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_MOVED, | | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_MOVED, |
| | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, | | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, |
| | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, |
| | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, |
| | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE, NFS4ERR_SYMLINK, | | | NFS4ERR_STALE, NFS4ERR_SYMLINK, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE |
+----------------+--------------------------------------------------+
| CLONE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | CLONE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, |
| | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, |
| | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_MOVED, | | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_MOVED, |
| | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, |
| | NFS4ERR_NOSPC, NFS4ERR_OLD_STATEID, | | | NFS4ERR_NOSPC, NFS4ERR_OLD_STATEID, |
| | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, |
| | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE, NFS4ERR_SYMLINK, | | | NFS4ERR_STALE, NFS4ERR_SYMLINK, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE, | | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE, |
| | NFS4ERR_XDEV | | | NFS4ERR_XDEV |
+----------------+--------------------------------------------------+
| COPY | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | COPY | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, |
| | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, |
| | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_LOCKED, | | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_LOCKED, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOSPC, NFS4ERR_OFFLOAD_DENIED, | | | NFS4ERR_NOSPC, NFS4ERR_OFFLOAD_DENIED, |
| | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, |
| | NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_PARTNER_NO_AUTH, | | | NFS4ERR_PARTNER_NO_AUTH, |
| | NFS4ERR_PARTNER_NOTSUPP, NFS4ERR_PNFS_IO_HOLE, | | | NFS4ERR_PARTNER_NOTSUPP, NFS4ERR_PNFS_IO_HOLE, |
| | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, |
| | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE, NFS4ERR_SYMLINK, | | | NFS4ERR_STALE, NFS4ERR_SYMLINK, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE |
+----------------+--------------------------------------------------+
| COPY_NOTIFY | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | COPY_NOTIFY | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, |
| | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, | | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, |
| | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_PNFS_IO_HOLE, | | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_PNFS_IO_HOLE, |
| | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, |
| | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, |
| | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_WRONG_TYPE | | | NFS4ERR_WRONG_TYPE |
+----------------+--------------------------------------------------+
| DEALLOCATE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | DEALLOCATE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, |
| | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, |
| | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, | | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, |
| | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, |
| | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, | | | NFS4ERR_ROFS, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE, NFS4ERR_SYMLINK, | | | NFS4ERR_STALE, NFS4ERR_SYMLINK, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE | | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE |
+----------------+--------------------------------------------------+
| GETDEVICELIST | NFS4ERR_NOTSUPP | | GETDEVICELIST | NFS4ERR_NOTSUPP |
+----------------+--------------------------------------------------+
| IO_ADVISE | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, |
| | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, |
| | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, |
| | NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE, NFS4ERR_SYMLINK, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_TYPE |
+----------------+--------------------------------------------------+
| LAYOUTERROR | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | | LAYOUTERROR | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, |
| | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, |
| | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, |
| | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, |
| | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_ISDIR, | | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_ISDIR, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | | | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, |
| | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, |
| | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, |
| | NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, |
| | NFS4ERR_WRONG_TYPE | | | NFS4ERR_WRONG_TYPE |
+----------------+--------------------------------------------------+
| LAYOUTSTATS | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | | LAYOUTSTATS | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, |
| | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, | | | NFS4ERR_BAD_STATEID, NFS4ERR_DEADSESSION, |
| | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, | | | NFS4ERR_DELAY, NFS4ERR_DELEG_REVOKED, |
| | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, | | | NFS4ERR_EXPIRED, NFS4ERR_FHEXPIRED, |
| | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_ISDIR, | | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_ISDIR, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | | | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, |
| | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, |
| | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, |
| | NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, | | | NFS4ERR_UNKNOWN_LAYOUTTYPE, NFS4ERR_WRONG_CRED, |
| | NFS4ERR_WRONG_TYPE | | | NFS4ERR_WRONG_TYPE |
+----------------+--------------------------------------------------+
| OFFLOAD_CANCEL | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | | OFFLOAD_CANCEL | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, |
| | NFS4ERR_BAD_STATEID, NFS4ERR_COMPLETE_ALREADY, | | | NFS4ERR_BAD_STATEID, NFS4ERR_COMPLETE_ALREADY, |
| | NFS4ERR_DEADSESSION, NFS4ERR_EXPIRED, | | | NFS4ERR_DEADSESSION, NFS4ERR_EXPIRED, |
| | NFS4ERR_DELAY, NFS4ERR_GRACE, NFS4ERR_NOTSUPP, | | | NFS4ERR_DELAY, NFS4ERR_GRACE, NFS4ERR_NOTSUPP, |
| | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_SERVERFAULT, NFS4ERR_TOO_MANY_OPS | | | NFS4ERR_SERVERFAULT, NFS4ERR_TOO_MANY_OPS |
+----------------+--------------------------------------------------+
| OFFLOAD_STATUS | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, | | OFFLOAD_STATUS | NFS4ERR_ADMIN_REVOKED, NFS4ERR_BADXDR, |
| | NFS4ERR_BAD_STATEID, NFS4ERR_COMPLETE_ALREADY, | | | NFS4ERR_BAD_STATEID, NFS4ERR_COMPLETE_ALREADY, |
| | NFS4ERR_DEADSESSION, NFS4ERR_EXPIRED, | | | NFS4ERR_DEADSESSION, NFS4ERR_EXPIRED, |
| | NFS4ERR_DELAY, NFS4ERR_GRACE, NFS4ERR_NOTSUPP, | | | NFS4ERR_DELAY, NFS4ERR_GRACE, NFS4ERR_NOTSUPP, |
| | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OLD_STATEID, NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_SERVERFAULT, NFS4ERR_TOO_MANY_OPS | | | NFS4ERR_SERVERFAULT, NFS4ERR_TOO_MANY_OPS |
+----------------+--------------------------------------------------+
| READ_PLUS | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | READ_PLUS | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, |
| | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, | | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, |
| | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_PNFS_IO_HOLE, NFS4ERR_PNFS_NO_LAYOUT, | | | NFS4ERR_PARTNER_NO_AUTH, NFS4ERR_PNFS_IO_HOLE, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_PNFS_NO_LAYOUT, NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, |
| | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, |
| | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_WRONG_TYPE | | | NFS4ERR_WRONG_TYPE |
+----------------+--------------------------------------------------+
| SEEK | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | SEEK | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, | | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, |
| | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, | | | NFS4ERR_ISDIR, NFS4ERR_IO, NFS4ERR_LOCKED, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, | | | NFS4ERR_NOTSUPP, NFS4ERR_OLD_STATEID, |
| | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OPENMODE, NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_PNFS_IO_HOLE, NFS4ERR_PNFS_NO_LAYOUT, | | | NFS4ERR_PNFS_IO_HOLE, NFS4ERR_PNFS_NO_LAYOUT, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_RETRY_UNCACHED_REP, |
| | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, |
| | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNION_NOTSUPP, NFS4ERR_WRONG_TYPE | | | NFS4ERR_UNION_NOTSUPP, NFS4ERR_WRONG_TYPE |
+----------------+--------------------------------------------------+
| WRITE_SAME | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | WRITE_SAME | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, | | | NFS4ERR_BADXDR, NFS4ERR_BAD_STATEID, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, | | | NFS4ERR_DELEG_REVOKED, NFS4ERR_DQUOT, |
| | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, | | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, NFS4ERR_INVAL, |
| | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_LOCKED, | | | NFS4ERR_IO, NFS4ERR_ISDIR, NFS4ERR_LOCKED, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NOSPC, NFS4ERR_NOTSUPP, | | | NFS4ERR_NOSPC, NFS4ERR_NOTSUPP, |
| | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, | | | NFS4ERR_OLD_STATEID, NFS4ERR_OPENMODE, |
skipping to change at page 49, line 31 skipping to change at page 51, line 33
12. New File Attributes 12. New File Attributes
12.1. New RECOMMENDED Attributes - List and Definition References 12.1. New RECOMMENDED Attributes - List and Definition References
The list of new RECOMMENDED attributes appears in Table 4. The The list of new RECOMMENDED attributes appears in Table 4. The
meaning of the columns of the table are: meaning of the columns of the table are:
Name: The name of the attribute. Name: The name of the attribute.
Id: The number assigned to the attribute. In the event of conflicts Id: The number assigned to the attribute. In the event of conflicts
between the assigned number and [NFSv42xdr], the latter is likely between the assigned number and
[I-D.ietf-nfsv4-minorversion2-dot-x], the latter is likely
authoritative, but should be resolved with Errata to this document authoritative, but should be resolved with Errata to this document
and/or [NFSv42xdr]. See [IESG08] for the Errata process. and/or [I-D.ietf-nfsv4-minorversion2-dot-x]. See [IESG08] for the
Errata process.
Data Type: The XDR data type of the attribute. Data Type: The XDR data type of the attribute.
Acc: Access allowed to the attribute. Acc: Access allowed to the attribute.
R means read-only (GETATTR may retrieve, SETATTR may not set). R means read-only (GETATTR may retrieve, SETATTR may not set).
W means write-only (SETATTR may set, GETATTR may not retrieve). W means write-only (SETATTR may set, GETATTR may not retrieve).
R W means read/write (GETATTR may retrieve, SETATTR may set). R W means read/write (GETATTR may retrieve, SETATTR may set).
Defined in: The section of this specification that describes the Defined in: The section of this specification that describes the
attribute. attribute.
+------------------+----+-------------------+-----+----------------+ +------------------+----+-------------------+-----+----------------+
| Name | Id | Data Type | Acc | Defined in | | Name | Id | Data Type | Acc | Defined in |
+------------------+----+-------------------+-----+----------------+ +------------------+----+-------------------+-----+----------------+
| clone_blksize | 77 | length4 | R | Section 12.2.1 | | clone_blksize | 77 | uint32_t | R | Section 12.2.1 |
| space_freed | 78 | length4 | R | Section 12.2.2 | | space_freed | 78 | length4 | R | Section 12.2.2 |
| change_attr_type | 79 | change_attr_type4 | R | Section 12.2.3 | | change_attr_type | 79 | change_attr_type4 | R | Section 12.2.3 |
| sec_label | 80 | sec_label4 | R W | Section 12.2.4 | | sec_label | 80 | sec_label4 | R W | Section 12.2.4 |
+------------------+----+-------------------+-----+----------------+ +------------------+----+-------------------+-----+----------------+
Table 4 Table 4
12.2. Attribute Definitions 12.2. Attribute Definitions
12.2.1. Attribute 77: clone_blksize 12.2.1. Attribute 77: clone_blksize
skipping to change at page 52, line 29 skipping to change at page 54, line 29
}; };
<CODE ENDS> <CODE ENDS>
The FATTR4_SEC_LABEL contains an array of two components with the The FATTR4_SEC_LABEL contains an array of two components with the
first component being an LFS. It serves to provide the receiving end first component being an LFS. It serves to provide the receiving end
with the information necessary to translate the security attribute with the information necessary to translate the security attribute
into a form that is usable by the endpoint. Label Formats assigned into a form that is usable by the endpoint. Label Formats assigned
an LFS may optionally choose to include a Policy Identifier field to an LFS may optionally choose to include a Policy Identifier field to
allow for complex policy deployments. The LFS and Label Format allow for complex policy deployments. The LFS and Label Format
Registry are described in detail in [Quigley14]. The translation Registry are described in detail in [RFC7569]. The translation used
used to interpret the security attribute is not specified as part of to interpret the security attribute is not specified as part of the
the protocol as it may depend on various factors. The second protocol as it may depend on various factors. The second component
component is an opaque section which contains the data of the is an opaque section which contains the data of the attribute. This
attribute. This component is dependent on the MAC model to interpret component is dependent on the MAC model to interpret and enforce.
and enforce.
In particular, it is the responsibility of the LFS specification to In particular, it is the responsibility of the LFS specification to
define a maximum size for the opaque section, slai_data<>. When define a maximum size for the opaque section, slai_data<>. When
creating or modifying a label for an object, the client needs to be creating or modifying a label for an object, the client needs to be
guaranteed that the server will accept a label that is sized guaranteed that the server will accept a label that is sized
correctly. By both client and server being part of a specific MAC correctly. By both client and server being part of a specific MAC
model, the client will be aware of the size. model, the client will be aware of the size.
13. Operations: REQUIRED, RECOMMENDED, or OPTIONAL 13. Operations: REQUIRED, RECOMMENDED, or OPTIONAL
skipping to change at page 54, line 4 skipping to change at page 55, line 50
third column of the table designates the feature(s) and if the third column of the table designates the feature(s) and if the
operation is REQUIRED or OPTIONAL in the presence of support for the operation is REQUIRED or OPTIONAL in the presence of support for the
feature. feature.
The OPTIONAL features identified and their abbreviations are as The OPTIONAL features identified and their abbreviations are as
follows: follows:
pNFS: Parallel NFS pNFS: Parallel NFS
FDELG: File Delegations FDELG: File Delegations
DDELG: Directory Delegations
DDELG: Directory Delegations
COPYra: Intra-server Server Side Copy COPYra: Intra-server Server Side Copy
COPYer: Inter-server Server Side Copy COPYer: Inter-server Server Side Copy
ADB: Application Data Blocks ADB: Application Data Blocks
Operations Operations
+----------------------+---------------------+----------------------+ +----------------------+--------------------+-----------------------+
| Operation | EOL, REQ, REC, OPT, | Feature (REQ, REC, | | Operation | REQ, REC, OPT, or | Feature (REQ, REC, or |
| | or MNI | or OPT) | | | MNI | OPT) |
+----------------------+---------------------+----------------------+ +----------------------+--------------------+-----------------------+
| ALLOCATE | OPT | | | ALLOCATE | OPT | |
| ACCESS | REQ | | | ACCESS | REQ | |
| BACKCHANNEL_CTL | REQ | | | BACKCHANNEL_CTL | REQ | |
| BIND_CONN_TO_SESSION | REQ | | | BIND_CONN_TO_SESSION | REQ | |
| CLONE | OPT | | | CLONE | OPT | |
| CLOSE | REQ | | | CLOSE | REQ | |
| COMMIT | REQ | | | COMMIT | REQ | |
| COPY | OPT | COPYer (REQ), COPYra | | COPY | OPT | COPYer (REQ), COPYra |
| | | (REQ) | | | | (REQ) |
| COPY_NOTIFY | OPT | COPYer (REQ) | | COPY_NOTIFY | OPT | COPYer (REQ) |
| DEALLOCATE | OPT | | | DEALLOCATE | OPT | |
| CREATE | REQ | | | CREATE | REQ | |
| CREATE_SESSION | REQ | | | CREATE_SESSION | REQ | |
| DELEGPURGE | OPT | FDELG (REQ) | | DELEGPURGE | OPT | FDELG (REQ) |
| DELEGRETURN | OPT | FDELG, DDELG, pNFS | | DELEGRETURN | OPT | FDELG, DDELG, pNFS |
| | | (REQ) | | | | (REQ) |
| DESTROY_CLIENTID | REQ | | | DESTROY_CLIENTID | REQ | |
| DESTROY_SESSION | REQ | | | DESTROY_SESSION | REQ | |
| EXCHANGE_ID | REQ | | | EXCHANGE_ID | REQ | |
| FREE_STATEID | REQ | | | FREE_STATEID | REQ | |
| GETATTR | REQ | | | GETATTR | REQ | |
| GETDEVICEINFO | OPT | pNFS (REQ) | | GETDEVICEINFO | OPT | pNFS (REQ) |
| GETDEVICELIST | MNI | pNFS (MNI) | | GETDEVICELIST | MNI | pNFS (MNI) |
| GETFH | REQ | | | GETFH | REQ | |
| GET_DIR_DELEGATION | OPT | DDELG (REQ) | | GET_DIR_DELEGATION | OPT | DDELG (REQ) |
| LAYOUTCOMMIT | OPT | pNFS (REQ) | | ILLEGAL | REQ | |
| LAYOUTGET | OPT | pNFS (REQ) | | IO_ADVISE | OPT | |
| LAYOUTRETURN | OPT | pNFS (REQ) | | LAYOUTCOMMIT | OPT | pNFS (REQ) |
| LAYOUTERROR | OPT | pNFS (OPT) | | LAYOUTGET | OPT | pNFS (REQ) |
| LAYOUTSTATS | OPT | pNFS (OPT) | | LAYOUTRETURN | OPT | pNFS (REQ) |
| LINK | OPT | | | LAYOUTERROR | OPT | pNFS (OPT) |
| LOCK | REQ | | | LAYOUTSTATS | OPT | pNFS (OPT) |
| LOCKT | REQ | | | LINK | OPT | |
| LOCKU | REQ | | | LOCK | REQ | |
| LOOKUP | REQ | | | LOCKT | REQ | |
| LOOKUPP | REQ | | | LOCKU | REQ | |
| NVERIFY | REQ | | | LOOKUP | REQ | |
| OFFLOAD_CANCEL | OPT | COPYer (REQ), COPYra | | LOOKUPP | REQ | |
| | | (REQ) | | NVERIFY | REQ | |
| OFFLOAD_STATUS | OPT | COPYer (REQ), COPYra | | OFFLOAD_CANCEL | OPT | COPYer (OPT), COPYra |
| | | (REQ) | | | | (OPT) |
| OPEN | REQ | | | OFFLOAD_STATUS | OPT | COPYer (OPT), COPYra |
| OPENATTR | OPT | | | | | (OPT) |
| OPEN_CONFIRM | MNI | | | OPEN | REQ | |
| OPEN_DOWNGRADE | REQ | | | OPENATTR | OPT | |
| PUTFH | REQ | | | OPEN_CONFIRM | MNI | |
| PUTPUBFH | REQ | | | OPEN_DOWNGRADE | REQ | |
| PUTROOTFH | REQ | | | PUTFH | REQ | |
| READ | REQ | | | PUTPUBFH | REQ | |
| READDIR | REQ | | | PUTROOTFH | REQ | |
| READLINK | OPT | | | READ | REQ | |
| READ_PLUS | OPT | | | READDIR | REQ | |
| RECLAIM_COMPLETE | REQ | | | READLINK | OPT | |
| RELEASE_LOCKOWNER | MNI | | | READ_PLUS | OPT | |
| REMOVE | REQ | | | RECLAIM_COMPLETE | REQ | |
| RENAME | REQ | | | RELEASE_LOCKOWNER | MNI | |
| RENEW | MNI | | | REMOVE | REQ | |
| RESTOREFH | REQ | | | RENAME | REQ | |
| SAVEFH | REQ | | | RENEW | MNI | |
| SECINFO | REQ | | | RESTOREFH | REQ | |
| SECINFO_NO_NAME | REC | pNFS file layout | | SAVEFH | REQ | |
| | | (REQ) | | SECINFO | REQ | |
| SEEK | OPT | | | SECINFO_NO_NAME | REC | pNFS file layout |
| SEQUENCE | REQ | | | | | (REQ) |
| SETATTR | REQ | | | SEEK | OPT | |
| SETCLIENTID | MNI | | | SEQUENCE | REQ | |
| SETCLIENTID_CONFIRM | MNI | | | SETATTR | REQ | |
| SET_SSV | REQ | | | SETCLIENTID | MNI | |
| TEST_STATEID | REQ | | | SETCLIENTID_CONFIRM | MNI | |
| VERIFY | REQ | | | SET_SSV | REQ | |
| WANT_DELEGATION | OPT | FDELG (OPT) | | TEST_STATEID | REQ | |
| WRITE | REQ | | | VERIFY | REQ | |
| WRITE_SAME | OPT | ADB (REQ) | | WANT_DELEGATION | OPT | FDELG (OPT) |
+----------------------+---------------------+----------------------+ | WRITE | REQ | |
| WRITE_SAME | OPT | ADB (REQ) |
+----------------------+--------------------+-----------------------+
Table 5
Callback Operations Callback Operations
+-------------------------+------------------+----------------------+ +-------------------------+------------------+----------------------+
| Operation | REQ, REC, OPT, | Feature (REQ, REC, | | Operation | REQ, REC, OPT, | Feature (REQ, REC, |
| | or MNI | or OPT) | | | or MNI | or OPT) |
+-------------------------+------------------+----------------------+ +-------------------------+------------------+----------------------+
| CB_OFFLOAD | OPT | COPYer (REQ), COPYra |
| | | (REQ) |
| CB_GETATTR | OPT | FDELG (REQ) | | CB_GETATTR | OPT | FDELG (REQ) |
| CB_ILLEGAL | REQ | |
| CB_LAYOUTRECALL | OPT | pNFS (REQ) | | CB_LAYOUTRECALL | OPT | pNFS (REQ) |
| CB_NOTIFY | OPT | DDELG (REQ) | | CB_NOTIFY | OPT | DDELG (REQ) |
| CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) |
| CB_NOTIFY_LOCK | OPT | | | CB_NOTIFY_LOCK | OPT | |
| CB_OFFLOAD | OPT | COPYer (REQ), COPYra |
| | | (REQ) |
| CB_PUSH_DELEG | OPT | FDELG (OPT) | | CB_PUSH_DELEG | OPT | FDELG (OPT) |
| CB_RECALL | OPT | FDELG, DDELG, pNFS | | CB_RECALL | OPT | FDELG, DDELG, pNFS |
| | | (REQ) | | | | (REQ) |
| CB_RECALL_ANY | OPT | FDELG, DDELG, pNFS | | CB_RECALL_ANY | OPT | FDELG, DDELG, pNFS |
| | | (REQ) | | | | (REQ) |
| CB_RECALL_SLOT | REQ | | | CB_RECALL_SLOT | REQ | |
| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS (REQ) | | CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS (REQ) |
| CB_SEQUENCE | OPT | FDELG, DDELG, pNFS | | CB_SEQUENCE | OPT | FDELG, DDELG, pNFS |
| | | (REQ) | | | | (REQ) |
| CB_WANTS_CANCELLED | OPT | FDELG, DDELG, pNFS | | CB_WANTS_CANCELLED | OPT | FDELG, DDELG, pNFS |
| | | (REQ) | | | | (REQ) |
+-------------------------+------------------+----------------------+ +-------------------------+------------------+----------------------+
Table 6
14. Modifications to NFSv4.1 Operations 14. Modifications to NFSv4.1 Operations
14.1. Operation 42: EXCHANGE_ID - Instantiate Client ID 14.1. Operation 42: EXCHANGE_ID - Instantiate Client ID
14.1.1. ARGUMENT 14.1.1. ARGUMENT
<CODE BEGINS> <CODE BEGINS>
/* new */ /* new */
const EXCHGID4_FLAG_SUPP_FENCE_OPS = 0x00000004; const EXCHGID4_FLAG_SUPP_FENCE_OPS = 0x00000004;
skipping to change at page 63, line 32 skipping to change at page 65, line 32
this requirement, then it MUST return an error of this requirement, then it MUST return an error of
NFS4ERR_OFFLOAD_NO_REQS and set cr_consecutive to be false. NFS4ERR_OFFLOAD_NO_REQS and set cr_consecutive to be false.
Likewise, if ca_synchronous is set, then the client has required that Likewise, if ca_synchronous is set, then the client has required that
the copy protocol selected MUST perform a synchronous copy. If the the copy protocol selected MUST perform a synchronous copy. If the
destination server cannot meet this requirement, then it MUST return destination server cannot meet this requirement, then it MUST return
an error of NFS4ERR_OFFLOAD_NO_REQS and set cr_synchronous to be an error of NFS4ERR_OFFLOAD_NO_REQS and set cr_synchronous to be
false. false.
If both are set by the client, then the destination SHOULD try to If both are set by the client, then the destination SHOULD try to
determine if it can respond to both requirements at the same time. determine if it can respond to both requirements at the same time.
If it cannot make that determination, it must set to false the one it If it cannot make that determination, it must set to true the one it
can and set to true the other. The client, upon getting an can and set to false the other. The client, upon getting an
NFS4ERR_OFFLOAD_NO_REQS error, has to examine both cr_consecutive and NFS4ERR_OFFLOAD_NO_REQS error, has to examine both cr_consecutive and
cr_synchronous against the respective values of ca_consecutive and cr_synchronous against the respective values of ca_consecutive and
ca_synchronous to determine the possible requirement not met. It ca_synchronous to determine the possible requirement not met. It
MUST be prepared for the destination server not being able to MUST be prepared for the destination server not being able to
determine both requirements at the same time. determine both requirements at the same time.
Upon receiving the NFS4ERR_OFFLOAD_NO_REQS error, the client has to Upon receiving the NFS4ERR_OFFLOAD_NO_REQS error, the client has to
determine if it wants to either re-request the copy with a relaxed determine if it wants to either re-request the copy with a relaxed
set of requirements or if it wants to revert to manually copying the set of requirements or if it wants to revert to manually copying the
data. If it decides to manually copy the data and this is a remote data. If it decides to manually copy the data and this is a remote
skipping to change at page 66, line 8 skipping to change at page 68, line 8
network location format. The server is not required to resolve the network location format. The server is not required to resolve the
cna_destination_server address before completing this operation. cna_destination_server address before completing this operation.
If this operation succeeds, the source server will allow the If this operation succeeds, the source server will allow the
cna_destination_server to copy the specified file on behalf of the cna_destination_server to copy the specified file on behalf of the
given user as long as both of the following conditions are met: given user as long as both of the following conditions are met:
o The destination server begins reading the source file before the o The destination server begins reading the source file before the
cnr_lease_time expires. If the cnr_lease_time expires while the cnr_lease_time expires. If the cnr_lease_time expires while the
destination server is still reading the source file, the destination server is still reading the source file, the
destination server is allowed to finish reading the file. destination server is allowed to finish reading the file. If the
cnr_lease_time expires before the destination server uses READ or
READ_PLUS to begin the transfer, the source server can use
NFS4ERR_PARTNER_NO_AUTH to inform the destination server that the
cnr_lease_time has expired.
o The client has not issued a OFFLOAD_CANCEL for the same o The client has not issued a OFFLOAD_CANCEL for the same
combination of user, filehandle, and destination server. combination of user, filehandle, and destination server.
The cnr_lease_time is chosen by the source server. A cnr_lease_time The cnr_lease_time is chosen by the source server. A cnr_lease_time
of 0 (zero) indicates an infinite lease. To avoid the need for of 0 (zero) indicates an infinite lease. To avoid the need for
synchronized clocks, copy lease times are granted by the server as a synchronized clocks, copy lease times are granted by the server as a
time delta. To renew the copy lease time the client should resend time delta. To renew the copy lease time the client should resend
the same copy notification request to the source server. the same copy notification request to the source server.
The cnr_stateid is a copy stateid which uniquely describes the state The cnr_stateid is a copy stateid which uniquely describes the state
needed on the source server to track the proposed copy. As defined needed on the source server to track the proposed copy. As defined
in Section 8.2 of [RFC5661], a stateid is tied to the current in Section 8.2 of [RFC5661], a stateid is tied to the current
filehandle and if the same stateid is presented by two different filehandle and if the same stateid is presented by two different
clients, it may refer to different state. As the source does not clients, it may refer to different state. As the source does not
know which netloc4 network location the destinaton might use to know which netloc4 network location the destination might use to
establish the copy operation, it can use the cnr_stateid to identify establish the copy operation, it can use the cnr_stateid to identify
that the destination is operating on behalf of the client. Thus the that the destination is operating on behalf of the client. Thus the
source server MUST construct copy stateids such that they are source server MUST construct copy stateids such that they are
distinct from all other stateids handed out to clients. These copy distinct from all other stateids handed out to clients. These copy
stateids MUST denote the same set of locks as each of the earlier stateids MUST denote the same set of locks as each of the earlier
delegation, locking, and open states for the client on the given file delegation, locking, and open states for the client on the given file
(see Section 4.4.1). (see Section 4.4.1).
A successful response will also contain a list of netloc4 network A successful response will also contain a list of netloc4 network
location formats called cnr_source_server, on which the source is location formats called cnr_source_server, on which the source is
skipping to change at page 71, line 42 skipping to change at page 73, line 42
then free to use this information about client I/O to optimize the then free to use this information about client I/O to optimize the
data storage location. data storage location.
This hint is also useful in the case of NFS clients which are network This hint is also useful in the case of NFS clients which are network
booting from a server. If the first client to be booted sends this booting from a server. If the first client to be booted sends this
hint, then it keeps the cache warm for the remaining clients. hint, then it keeps the cache warm for the remaining clients.
15.5.6. pNFS File Layout Data Type Considerations 15.5.6. pNFS File Layout Data Type Considerations
The IO_ADVISE considerations for pNFS are very similar to the COMMIT The IO_ADVISE considerations for pNFS are very similar to the COMMIT
considerations for pNFS. That is, as with COMMIT, some NFS server considerations for pNFS (see Section 13.7 of [RFC5661]). That is, as
implementations prefer IO_ADVISE be done on the DS, and some prefer with COMMIT, some NFS server implementations prefer IO_ADVISE be done
it be done on the MDS. on the storage device, and some prefer it be done on the metadata
server.
For the file's layout type, it is proposed that NFSv4.2 include an
additional hint NFL42_CARE_IO_ADVISE_THRU_MDS which is valid only on
metadata servers running NFSv4.2 or higher. Any file's layout
obtained from a NFSv4.1 metadata server MUST NOT have
NFL42_UFLG_IO_ADVISE_THRU_MDS set. Any file's layout obtained with a
NFSv4.2 metadata server MAY have NFL42_UFLG_IO_ADVISE_THRU_MDS set.
However, if the layout utilizes NFSv4.1 storage devices, the For the file's layout type, NFSv4.2 includes an additional hint
IO_ADVISE operation cannot be sent to them. NFL42_CARE_IO_ADVISE_THRU_MDS which is valid only on metadata servers
running NFSv4.2 or higher. Any file's layout obtained from a NFSv4.1
metadata server MUST NOT have NFL42_UFLG_IO_ADVISE_THRU_MDS set. Any
file's layout obtained with a NFSv4.2 metadata server MAY have
NFL42_UFLG_IO_ADVISE_THRU_MDS set. However, if the layout utilizes
NFSv4.1 storage devices, the IO_ADVISE operation cannot be sent to
them.
If NFL42_UFLG_IO_ADVISE_THRU_MDS is set, the client MUST send the If NFL42_UFLG_IO_ADVISE_THRU_MDS is set, the client MUST send the
IO_ADVISE operation to the MDS in order for it to be honored by the IO_ADVISE operation to the metadata server in order for it to be
DS. Once the MDS receives the IO_ADVISE operation, it will honored by the storage device. Once the metadata server receives the
communicate the advice to each DS. IO_ADVISE operation, it will communicate the advice to each storage
device.
If NFL42_UFLG_IO_ADVISE_THRU_MDS is not set, then the client SHOULD If NFL42_UFLG_IO_ADVISE_THRU_MDS is not set, then the client SHOULD
send an IO_ADVISE operation to the appropriate DS for the specified send an IO_ADVISE operation to the appropriate storage device for the
byte range. While the client MAY always send IO_ADVISE to the MDS, specified byte range. While the client MAY always send IO_ADVISE to
if the server has not set NFL42_UFLG_IO_ADVISE_THRU_MDS, the client the metadata server, if the server has not set
should expect that such an IO_ADVISE is futile. Note that a client NFL42_UFLG_IO_ADVISE_THRU_MDS, the client should expect that such an
SHOULD use the same set of arguments on each IO_ADVISE sent to a DS IO_ADVISE is futile. Note that a client SHOULD use the same set of
for the same open file reference. arguments on each IO_ADVISE sent to a storage device for the same
open file reference.
The server is not required to support different advice for different The server is not required to support different advice for different
DS's with the same open file reference. storage devices with the same open file reference.
15.5.6.1. Dense and Sparse Packing Considerations 15.5.6.1. Dense and Sparse Packing Considerations
The IO_ADVISE operation MUST use the iar_offset and byte range as The IO_ADVISE operation MUST use the iar_offset and byte range as
dictated by the presence or absence of NFL4_UFLG_DENSE. dictated by the presence or absence of NFL4_UFLG_DENSE (see
Section 13.4.4 of [RFC5661]).
E.g., if NFL4_UFLG_DENSE is present, and a READ or WRITE to the DS E.g., if NFL4_UFLG_DENSE is present, and a READ or WRITE to the
for iaa_offset 0 really means iaa_offset 10000 in the logical file, storage device for iaa_offset 0 really means iaa_offset 10000 in the
then an IO_ADVISE for iaa_offset 0 means iaa_offset 10000. logical file, then an IO_ADVISE for iaa_offset 0 means iaa_offset
10000.
E.g., if NFL4_UFLG_DENSE is absent, then a READ or WRITE to the DS E.g., if NFL4_UFLG_DENSE is absent, then a READ or WRITE to the
for iaa_offset 0 really means iaa_offset 0 in the logical file, then storage device for iaa_offset 0 really means iaa_offset 0 in the
an IO_ADVISE for iaa_offset 0 means iaa_offset 0 in the logical file. logical file, then an IO_ADVISE for iaa_offset 0 means iaa_offset 0
in the logical file.
E.g., if NFL4_UFLG_DENSE is present, the stripe unit is 1000 bytes E.g., if NFL4_UFLG_DENSE is present, the stripe unit is 1000 bytes
and the stripe count is 10, and the dense DS file is serving and the stripe count is 10, and the dense storage device file is
iar_offset 0. A READ or WRITE to the DS for iaa_offsets 0, 1000, serving iar_offset 0. A READ or WRITE to the storage device for
2000, and 3000, really mean iaa_offsets 10000, 20000, 30000, and iaa_offsets 0, 1000, 2000, and 3000, really mean iaa_offsets 10000,
40000 (implying a stripe count of 10 and a stripe unit of 1000), then 20000, 30000, and 40000 (implying a stripe count of 10 and a stripe
an IO_ADVISE sent to the same DS with an iaa_offset of 500, and an unit of 1000), then an IO_ADVISE sent to the same storage device with
iaa_count of 3000 means that the IO_ADVISE applies to these byte an iaa_offset of 500, and an iaa_count of 3000 means that the
ranges of the dense DS file: IO_ADVISE applies to these byte ranges of the dense storage device
file:
- 500 to 999 - 500 to 999
- 1000 to 1999 - 1000 to 1999
- 2000 to 2999 - 2000 to 2999
- 3000 to 3499 - 3000 to 3499
I.e., the contiguous range 500 to 3499 as specified in IO_ADVISE. I.e., the contiguous range 500 to 3499 as specified in IO_ADVISE.
It also applies to these byte ranges of the logical file: It also applies to these byte ranges of the logical file:
- 10500 to 10999 (500 bytes) - 10500 to 10999 (500 bytes)
- 20000 to 20999 (1000 bytes) - 20000 to 20999 (1000 bytes)
- 30000 to 30999 (1000 bytes) - 30000 to 30999 (1000 bytes)
- 40000 to 40499 (500 bytes) - 40000 to 40499 (500 bytes)
(total 3000 bytes) (total 3000 bytes)
E.g., if NFL4_UFLG_DENSE is absent, the stripe unit is 250 bytes, the E.g., if NFL4_UFLG_DENSE is absent, the stripe unit is 250 bytes, the
stripe count is 4, and the sparse DS file is serving iaa_offset 0. stripe count is 4, and the sparse storage device file is serving
Then a READ or WRITE to the DS for iaa_offsets 0, 1000, 2000, and iaa_offset 0. Then a READ or WRITE to the storage device for
3000, really means iaa_offsets 0, 1000, 2000, and 3000 in the logical iaa_offsets 0, 1000, 2000, and 3000, really means iaa_offsets 0,
file, keeping in mind that on the DS file, byte ranges 250 to 999, 1000, 2000, and 3000 in the logical file, keeping in mind that on the
1250 to 1999, 2250 to 2999, and 3250 to 3999 are not accessible. storage device file, byte ranges 250 to 999, 1250 to 1999, 2250 to
Then an IO_ADVISE sent to the same DS with an iaa_offset of 500, and 2999, and 3250 to 3999 are not accessible. Then an IO_ADVISE sent to
a iaa_count of 3000 means that the IO_ADVISE applies to these byte the same storage device with an iaa_offset of 500, and a iaa_count of
ranges of the logical file and the sparse DS file: 3000 means that the IO_ADVISE applies to these byte ranges of the
logical file and the sparse storage device file:
- 500 to 999 (500 bytes) - no effect - 500 to 999 (500 bytes) - no effect
- 1000 to 1249 (250 bytes) - effective - 1000 to 1249 (250 bytes) - effective
- 1250 to 1999 (750 bytes) - no effect - 1250 to 1999 (750 bytes) - no effect
- 2000 to 2249 (250 bytes) - effective - 2000 to 2249 (250 bytes) - effective
- 2250 to 2999 (750 bytes) - no effect - 2250 to 2999 (750 bytes) - no effect
- 3000 to 3249 (250 bytes) - effective - 3000 to 3249 (250 bytes) - effective
- 3250 to 3499 (250 bytes) - no effect - 3250 to 3499 (250 bytes) - no effect
(subtotal 2250 bytes) - no effect (subtotal 2250 bytes) - no effect
(subtotal 750 bytes) - effective (subtotal 750 bytes) - effective
skipping to change at page 74, line 33 skipping to change at page 76, line 40
struct LAYOUTERROR4res { struct LAYOUTERROR4res {
nfsstat4 ler_status; nfsstat4 ler_status;
}; };
<CODE ENDS> <CODE ENDS>
15.6.3. DESCRIPTION 15.6.3. DESCRIPTION
The client can use LAYOUTERROR to inform the metadata server about The client can use LAYOUTERROR to inform the metadata server about
errors in its interaction with the layout represented by the current errors in its interaction with the layout (see Section 12 of
filehandle, client ID (derived from the session ID in the preceding [RFC5661]) represented by the current filehandle, client ID (derived
SEQUENCE operation), byte-range (lea_offset + lea_length), and from the session ID in the preceding SEQUENCE operation), byte-range
lea_stateid. (lea_offset + lea_length), and lea_stateid.
Each individual device_error4 describes a single error associated Each individual device_error4 describes a single error associated
with a storage device, which is identified via de_deviceid. If the with a storage device, which is identified via de_deviceid. If the
Layout Type supports NFSv4 operations, then the operation which Layout Type (see Section 12.2.7 of [RFC5661]) supports NFSv4
returned the error is identified via de_opnum. If the Layout Type operations, then the operation which returned the error is identified
does not support NFSv4 operations, then it MAY chose to either map via de_opnum. If the Layout Type does not support NFSv4 operations,
the operation onto one of the allowed operations which can be sent to then it MAY chose to either map the operation onto one of the allowed
a storage device with the File Layout Type (see Section 3.3) or it operations which can be sent to a storage device with the File Layout
can signal no support for operations by marking de_opnum with the Type (see Section 3.3) or it can signal no support for operations by
ILLEGAL operation. Finally the NFS error value (nfsstat4) marking de_opnum with the ILLEGAL operation. Finally the NFS error
encountered is provided via de_status and may consist of the value (nfsstat4) encountered is provided via de_status and may
following error codes: consist of the following error codes:
NFS4ERR_NXIO: The client was unable to establish any communication NFS4ERR_NXIO: The client was unable to establish any communication
with the storage device. with the storage device.
NFS4ERR_*: The client was able to establish communication with the NFS4ERR_*: The client was able to establish communication with the
storage device and is returning one of the allowed error codes for storage device and is returning one of the allowed error codes for
the operation denoted by de_opnum. the operation denoted by de_opnum.
Note that while the metadata server may return an error associated Note that while the metadata server may return an error associated
with the layout stateid or the open file, it MUST NOT return an error with the layout stateid or the open file, it MUST NOT return an error
skipping to change at page 77, line 40 skipping to change at page 79, line 47
struct LAYOUTSTATS4res { struct LAYOUTSTATS4res {
nfsstat4 lsr_status; nfsstat4 lsr_status;
}; };
<CODE ENDS> <CODE ENDS>
15.7.3. DESCRIPTION 15.7.3. DESCRIPTION
The client can use LAYOUTSTATS to inform the metadata server about The client can use LAYOUTSTATS to inform the metadata server about
its interaction with the layout represented by the current its interaction with the layout (see Section 12 of [RFC5661])
filehandle, client ID (derived from the session ID in the preceding represented by the current filehandle, client ID (derived from the
SEQUENCE operation), byte-range (lsa_offset and lsa_length), and session ID in the preceding SEQUENCE operation), byte-range
lsa_stateid. lsa_read and lsa_write allow for non-Layout Type (lsa_offset and lsa_length), and lsa_stateid. lsa_read and lsa_write
specific statistics to be reported. lsa_deviceid allows the client allow for non-Layout Type specific statistics to be reported.
to specify to which storage device the statistics apply. The lsa_deviceid allows the client to specify to which storage device the
remaining information the client is presenting is specific to the statistics apply. The remaining information the client is presenting
Layout Type and presented in the lsa_layoutupdate field. Each Layout is specific to the Layout Type and presented in the lsa_layoutupdate
Type MUST define the contents of lsa_layoutupdate in their respective field. Each Layout Type MUST define the contents of lsa_layoutupdate
specifications. in their respective specifications.
LAYOUTSTATS can be combined with IO_ADVISE (see Section 15.5) to LAYOUTSTATS can be combined with IO_ADVISE (see Section 15.5) to
augment the decision making process of how the metadata server augment the decision making process of how the metadata server
handles a file. I.e., IO_ADVISE lets the server know that a byte handles a file. I.e., IO_ADVISE lets the server know that a byte
range has a certain characteristic, but not necessarily the intensity range has a certain characteristic, but not necessarily the intensity
of that characteristic. of that characteristic.
The statistics are cumulative, i.e., multiple LAYOUTSTATS updates can The statistics are cumulative, i.e., multiple LAYOUTSTATS updates can
be in flight at the same time. The metadata server can examine the be in flight at the same time. The metadata server can examine the
packet's timestamp to order the different calls. The first packet's timestamp to order the different calls. The first
skipping to change at page 82, line 44 skipping to change at page 84, line 44
the client. the client.
If the client specifies a rpa_count value of zero, the READ_PLUS If the client specifies a rpa_count value of zero, the READ_PLUS
succeeds and returns zero bytes of data. In all situations, the succeeds and returns zero bytes of data. In all situations, the
server may choose to return fewer bytes than specified by the client. server may choose to return fewer bytes than specified by the client.
The client needs to check for this condition and handle the condition The client needs to check for this condition and handle the condition
appropriately. appropriately.
If the client specifies an rpa_offset and rpa_count value that is If the client specifies an rpa_offset and rpa_count value that is
entirely contained within a hole of the file, then the di_offset and entirely contained within a hole of the file, then the di_offset and
di_length returned MAY be for the entire hole. If the the owner has di_length returned MAY be for the entire hole. If the owner has a
a locked byte range covering rpa_offset and rpa_count entirely the locked byte range covering rpa_offset and rpa_count entirely the
di_offset and di_length MUST NOT be extended outside the locked byte di_offset and di_length MUST NOT be extended outside the locked byte
range. This result is considered valid until the file is changed range. This result is considered valid until the file is changed
(detected via the change attribute). The server MUST provide the (detected via the change attribute). The server MUST provide the
same semantics for the hole as if the client read the region and same semantics for the hole as if the client read the region and
received zeroes; the implied holes contents lifetime MUST be exactly received zeroes; the implied holes contents lifetime MUST be exactly
the same as any other read data. the same as any other read data.
If the client specifies an rpa_offset and rpa_count value that begins If the client specifies an rpa_offset and rpa_count value that begins
in a non-hole of the file but extends into hole the server should in a non-hole of the file but extends into hole the server should
return an array comprised of both data and a hole. The client MUST return an array comprised of both data and a hole. The client MUST
skipping to change at page 84, line 42 skipping to change at page 86, line 42
| Byte-Range | Contents | | Byte-Range | Contents |
+-------------+----------+ +-------------+----------+
| 0-15999 | Hole | | 0-15999 | Hole |
| 16K-31999 | Non-Zero | | 16K-31999 | Non-Zero |
| 32K-255999 | Hole | | 32K-255999 | Hole |
| 256K-287999 | Non-Zero | | 256K-287999 | Non-Zero |
| 288K-353999 | Hole | | 288K-353999 | Hole |
| 354K-417999 | Non-Zero | | 354K-417999 | Non-Zero |
+-------------+----------+ +-------------+----------+
Table 5 Table 7
Under the given circumstances, if a client was to read from the file Under the given circumstances, if a client was to read from the file
with a max read size of 64K, the following will be the results for with a max read size of 64K, the following will be the results for
the given READ_PLUS calls. This assumes the client has already the given READ_PLUS calls. This assumes the client has already
opened the file, acquired a valid stateid ('s' in the example), and opened the file, acquired a valid stateid ('s' in the example), and
just needs to issue READ_PLUS requests. just needs to issue READ_PLUS requests.
1. READ_PLUS(s, 0, 64K) --> NFS_OK, eof = false, <data[0,32K], 1. READ_PLUS(s, 0, 64K) --> NFS_OK, eof = false, <data[0,32K],
hole[32K,224K]>. Since the first hole is less than the server's hole[32K,224K]>. Since the first hole is less than the server's
minimum hole size, the first 32K of the file is returned as data minimum hole size, the first 32K of the file is returned as data
skipping to change at page 91, line 16 skipping to change at page 93, line 16
The CLONE operation is used to clone file content from a source file The CLONE operation is used to clone file content from a source file
specified by the SAVED_FH value into a destination file specified by specified by the SAVED_FH value into a destination file specified by
CURRENT_FH without actually copying the data, e.g., by using a copy- CURRENT_FH without actually copying the data, e.g., by using a copy-
on-write mechanism. on-write mechanism.
Both SAVED_FH and CURRENT_FH must be regular files. If either Both SAVED_FH and CURRENT_FH must be regular files. If either
SAVED_FH or CURRENT_FH is not a regular file, the operation MUST fail SAVED_FH or CURRENT_FH is not a regular file, the operation MUST fail
and return NFS4ERR_WRONG_TYPE. and return NFS4ERR_WRONG_TYPE.
SAVED_FH and CURRENT_FH must be different files. If SAVED_FH and
CURRENT_FH refer to the same file, the operation MUST fail with
NFS4ERR_INVAL.
The ca_dst_stateid MUST refer to a stateid that is valid for a WRITE The ca_dst_stateid MUST refer to a stateid that is valid for a WRITE
operation and follows the rules for stateids in Sections 8.2.5 and operation and follows the rules for stateids in Sections 8.2.5 and
18.32.3 of [RFC5661]. The ca_src_stateid MUST refer to a stateid 18.32.3 of [RFC5661]. The ca_src_stateid MUST refer to a stateid
that is valid for a READ operations and follows the rules for that is valid for a READ operations and follows the rules for
stateids in Sections 8.2.5 and 18.22.3 of [RFC5661]. If either stateids in Sections 8.2.5 and 18.22.3 of [RFC5661]. If either
stateid is invalid, then the operation MUST fail. stateid is invalid, then the operation MUST fail.
The cl_src_offset is the starting offset within the source file from The cl_src_offset is the starting offset within the source file from
which the data to be cloned will be obtained and the cl_dst_offset is which the data to be cloned will be obtained and the cl_dst_offset is
the starting offset of the target region into which the cloned data the starting offset of the target region into which the cloned data
skipping to change at page 91, line 45 skipping to change at page 93, line 41
cl_dst_offset must be aligned to the clone block size Section 12.2.1. cl_dst_offset must be aligned to the clone block size Section 12.2.1.
The number of bytes to be cloned must be a multiple of the clone The number of bytes to be cloned must be a multiple of the clone
block size, except in the case in which cl_src_offset plus the number block size, except in the case in which cl_src_offset plus the number
of bytes to be cloned is equal to the source file size. of bytes to be cloned is equal to the source file size.
If the source offset or the source offset plus count is greater than If the source offset or the source offset plus count is greater than
the size of the source file, the operation MUST fail with the size of the source file, the operation MUST fail with
NFS4ERR_INVAL. The destination offset or destination offset plus NFS4ERR_INVAL. The destination offset or destination offset plus
count may be greater than the size of the destination file. count may be greater than the size of the destination file.
If SAVED_FH and CURRENT_FH refer to the same file and the source and
target ranges overlap, the operation MUST fail with NFS4ERR_INVAL.
If the target area of the clone operation ends beyond the end of the If the target area of the clone operation ends beyond the end of the
destination file, the offset at the end of the target area will destination file, the offset at the end of the target area will
determine the new size of the destination file. The contents of any determine the new size of the destination file. The contents of any
block not part of the target area will be the same as if the file block not part of the target area will be the same as if the file
size were extended by a WRITE. size were extended by a WRITE.
If the area to be cloned is not a multiple of the clone block size If the area to be cloned is not a multiple of the clone block size
and the size of the destination file is past the end of the target and the size of the destination file is past the end of the target
area, the area between the end of the target area and the next area, the area between the end of the target area and the next
multiple o the clone block size wlll be zeroed. multiple of the clone block size will be zeroed.
The CLONE operation is atomic in that other operations may not see The CLONE operation is atomic in that other operations may not see
see any intermediate states between the state of the two files before any intermediate states between the state of the two files before the
the operation and that after the operation. READs of the destination operation and that after the operation. READs of the destination
file will never see some blocks of the target area cloned without all file will never see some blocks of the target area cloned without all
of them being cloned. WRITEs of the source area will either have no of them being cloned. WRITEs of the source area will either have no
effect on the data of the target file or be fully reflected in the effect on the data of the target file or be fully reflected in the
target area of the destination file. target area of the destination file.
The completion status of the operation is indicated by cr_status. The completion status of the operation is indicated by cr_status.
16. NFSv4.2 Callback Operations 16. NFSv4.2 Callback Operations
16.1. Operation 15: CB_OFFLOAD - Report results of an asynchronous 16.1. Operation 15: CB_OFFLOAD - Report results of an asynchronous
skipping to change at page 93, line 50 skipping to change at page 95, line 50
COPY: the total number of bytes copied COPY: the total number of bytes copied
WRITE_SAME: the same information that a synchronous WRITE_SAME would WRITE_SAME: the same information that a synchronous WRITE_SAME would
provide provide
17. Security Considerations 17. Security Considerations
NFSv4.2 has all of the security concerns present in NFSv4.1 (see NFSv4.2 has all of the security concerns present in NFSv4.1 (see
Section 21 of [RFC5661]) and those present in the Server Side Copy Section 21 of [RFC5661]) and those present in the Server Side Copy
(see Section 4.10) and in Labeled NFS (see Section 9.7). (see Section 4.10) and in Labeled NFS (see Section 9.6).
18. IANA Considerations 18. IANA Considerations
The IANA Considerations for Labeled NFS are addressed in [Quigley14]. The IANA Considerations for Labeled NFS are addressed in [RFC7569].
19. References 19. References
19.1. Normative References 19.1. Normative References
[NFSv42xdr] [I-D.ietf-nfsv4-minorversion2-dot-x]
Haynes, T., "Network File System (NFS) Version 4 Minor Haynes, T., "NFSv4 Minor Version 2 Protocol External Data
Version 2 External Data Representation Standard (XDR) Representation Standard (XDR) Description", draft-ietf-
Description", December 2014. nfsv4-minorversion2-dot-x-40 (work in progress), January
2016.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66, RFC Resource Identifier (URI): Generic Syntax", STD 66, RFC
3986, January 2005. 3986, January 2005.
[RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File
System (NFS) Version 4 Minor Version 1 Protocol", RFC System (NFS) Version 4 Minor Version 1 Protocol", RFC
5661, January 2010. 5661, January 2010.
[RFC5662] Shepler, S., Eisler, M., and D. Noveck, "Network File [RFC5662] Shepler, S., Eisler, M., and D. Noveck, "Network File
System (NFS) Version 4 Minor Version 1 External Data System (NFS) Version 4 Minor Version 1 External Data
Representation Standard (XDR) Description", RFC 5662, Representation Standard (XDR) Description", RFC 5662,
January 2010. January 2010.
[RFC7569] Quigley, D., Lu, J., and T. Haynes, "Registry
Specification for Mandatory Access Control (MAC) Security
Label Formats", RFC 7569, July 2015.
[posix_fadvise] [posix_fadvise]
The Open Group, "Section 'posix_fadvise()' of System The Open Group, "Section 'posix_fadvise()' of System
Interfaces of The Open Group Base Specifications Issue 6, Interfaces of The Open Group Base Specifications Issue 6,
IEEE Std 1003.1, 2004 Edition", 2004. IEEE Std 1003.1, 2004 Edition", 2004.
[posix_fallocate] [posix_fallocate]
The Open Group, "Section 'posix_fallocate()' of System The Open Group, "Section 'posix_fallocate()' of System
Interfaces of The Open Group Base Specifications Issue 6, Interfaces of The Open Group Base Specifications Issue 6,
IEEE Std 1003.1, 2004 Edition", 2004. IEEE Std 1003.1, 2004 Edition", 2004.
skipping to change at page 95, line 26 skipping to change at page 97, line 33
2008. 2008.
[McDougall07] [McDougall07]
McDougall, R. and J. Mauro, "Section 11.4.3, Detecting McDougall, R. and J. Mauro, "Section 11.4.3, Detecting
Memory Corruption of Solaris Internals", 2007. Memory Corruption of Solaris Internals", 2007.
[NFSv4-Versioning] [NFSv4-Versioning]
Haynes, T. and D. Noveck, "NFSv4 Version Management", Haynes, T. and D. Noveck, "NFSv4 Version Management",
November 2014. November 2014.
[Quigley14]
Quigley, D., Lu, J., and T. Haynes, "Registry
Specification for Mandatory Access Control (MAC) Security
Label Formats", draft-ietf-nfsv4-lfs-registry-01 (work in
progress), September 2014.
[RFC1108] Kent, S., "Security Options for the Internet Protocol", [RFC1108] Kent, S., "Security Options for the Internet Protocol",
RFC 1108, November 1991. RFC 1108, November 1991.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", March 1997. Requirement Levels", March 1997.
[RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the [RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the
Internet Protocol", RFC 2401, November 1998. Internet Protocol", RFC 2401, November 1998.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC4506] Eisler, M., "XDR: External Data Representation Standard", [RFC4506] Eisler, M., "XDR: External Data Representation Standard",
RFC 4506, May 2006. RFC 4506, May 2006.
[RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC
4949, August 2007.
[RFC5663] Black, D., Fridella, S., and J. Glasgow, "Parallel NFS [RFC5663] Black, D., Fridella, S., and J. Glasgow, "Parallel NFS
(pNFS) Block/Volume Layout", RFC 5663, January 2010. (pNFS) Block/Volume Layout", RFC 5663, January 2010.
[RFC7204] Haynes, T., "Requirements for Labeled NFS", RFC 7204, [RFC7204] Haynes, T., "Requirements for Labeled NFS", RFC 7204,
April 2014. April 2014.
[RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
Protocol (HTTP/1.1): Message Syntax and Routing", RFC
7230, DOI 10.17487/RFC7230, June 2014,
<http://www.rfc-editor.org/info/rfc7230>.
[RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS)
version 4 Protocol", RFC 7530, March 2015. version 4 Protocol", RFC 7530, March 2015.
[RFC959] Postel, J. and J. Reynolds, "File Transfer Protocol", STD [RFC959] Postel, J. and J. Reynolds, "File Transfer Protocol", STD
9, RFC 959, October 1985. 9, RFC 959, October 1985.
[Strohm11] [Strohm11]
Strohm, R., "Chapter 2, Data Blocks, Extents, and Strohm, R., "Chapter 2, Data Blocks, Extents, and
Segments, of Oracle Database Concepts 11g Release 1 Segments, of Oracle Database Concepts 11g Release 1
(11.1)", January 2011. (11.1)", January 2011.
skipping to change at page 97, line 20 skipping to change at page 99, line 24
Christoph Hellwig was very helpful in getting the WRITE_SAME Christoph Hellwig was very helpful in getting the WRITE_SAME
semantics to model more of what T10 was doing for WRITE SAME (10) semantics to model more of what T10 was doing for WRITE SAME (10)
[T10-SBC2]. And he led the push to get space reservations to more [T10-SBC2]. And he led the push to get space reservations to more
closely model the posix_fallocate. closely model the posix_fallocate.
Andy Adamson picked up the RPCSEC_GSSv3 work, which enabled both Andy Adamson picked up the RPCSEC_GSSv3 work, which enabled both
Labeled NFS and Server Side Copy to be present more secure options. Labeled NFS and Server Side Copy to be present more secure options.
Christoph Hellwig provided the update to GETDEVICELIST. Christoph Hellwig provided the update to GETDEVICELIST.
Jorge Mora provided a very detailed review and caught some important
issues with the tables.
During the review process, Talia Reyes-Ortiz helped the sessions run During the review process, Talia Reyes-Ortiz helped the sessions run
smoothly. While many people contributed here and there, the core smoothly. While many people contributed here and there, the core
reviewers were Andy Adamson, Pranoop Erasani, Bruce Fields, Chuck reviewers were Andy Adamson, Pranoop Erasani, Bruce Fields, Chuck
Lever, Trond Myklebust, David Noveck, Peter Staubach, and Mike Lever, Trond Myklebust, David Noveck, Peter Staubach, and Mike
Kupfer. Kupfer.
Appendix B. RFC Editor Notes Appendix B. RFC Editor Notes
[RFC Editor: please remove this section prior to publishing this [RFC Editor: please remove this section prior to publishing this
document as an RFC] document as an RFC]
[RFC Editor: prior to publishing this document as an RFC, please [RFC Editor: prior to publishing this document as an RFC, please
replace all occurrences of NFSv42xdr with RFCxxxx where xxxx is the replace all occurrences of I-D.ietf-nfsv4-minorversion2-dot-x with
RFC number of the companion XDR document] RFCxxxx where xxxx is the RFC number of the companion XDR document]
Author's Address Author's Address
Thomas Haynes Thomas Haynes
Primary Data, Inc. Primary Data, Inc.
4300 El Camino Real Ste 100 4300 El Camino Real Ste 100
Los Altos, CA 94022 Los Altos, CA 94022
USA USA
Phone: +1 408 215 1519 Phone: +1 408 215 1519
 End of changes. 183 change blocks. 
514 lines changed or deleted 626 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/