draft-ietf-nfsv4-minorversion1-17.txt   draft-ietf-nfsv4-minorversion1-18.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: May 22, 2008 Editors Expires: June 24, 2008 Editors
November 19, 2007 December 22, 2007
NFSv4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-17.txt draft-ietf-nfsv4-minorversion1-18.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on May 22, 2008. This Internet-Draft will expire on June 24, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2007). Copyright (C) The IETF Trust (2007).
Abstract Abstract
This Internet-Draft describes NFSv4 minor version one, including This Internet-Draft describes NFS version 4 minor version one,
features retained from the base protocol and protocol extensions made including features retained from the base protocol and protocol
subsequently. The current draft includes description of the major extensions made subsequently. Major extensions introduced in NFS
extensions, Sessions, Directory Delegations, and parallel NFS (pNFS). version 4 minor version one include: Sessions, Directory Delegations,
This Internet-Draft is an active work item of the NFSv4 working and parallel NFS (pNFS).
group. Active and resolved issues may be found in the issue tracker
at: http://www.nfsv4-editor.org/cgi-bin/roundup/nfsv4. New issues
related to this document should be raised with the NFSv4 Working
Group nfsv4@ietf.org.
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1]. document are to be interpreted as described in RFC 2119 [1].
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 10 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1. The NFSv4.1 Protocol . . . . . . . . . . . . . . . . . . 10 1.1. The NFS Version 4 Minor Version 1 Protocol . . . . . . . 11
1.2. NFS Version 4 Goals . . . . . . . . . . . . . . . . . . 10 1.2. Scope of this Document . . . . . . . . . . . . . . . . . 11
1.3. Minor Version 1 Goals . . . . . . . . . . . . . . . . . 11 1.3. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . 11
1.4. Overview of NFS version 4.1 Features . . . . . . . . . . 11 1.4. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . 12
1.4.1. RPC and Security . . . . . . . . . . . . . . . . . . 12 1.5. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 12
1.4.2. Protocol Structure . . . . . . . . . . . . . . . . . 12 1.5.1. RPC and Security . . . . . . . . . . . . . . . . . . 13
1.4.3. File System Model . . . . . . . . . . . . . . . . . 13 1.5.2. Protocol Structure . . . . . . . . . . . . . . . . . 13
1.4.4. Locking Facilities . . . . . . . . . . . . . . . . . 14 1.5.3. File System Model . . . . . . . . . . . . . . . . . 14
1.5. General Definitions . . . . . . . . . . . . . . . . . . 15 1.5.4. Locking Facilities . . . . . . . . . . . . . . . . . 15
1.6. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 17 1.6. General Definitions . . . . . . . . . . . . . . . . . . 16
2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 17 1.7. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 18
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 17 2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 19
2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 18 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 18 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 19
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 21 2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 19
2.4. Client Identifiers and Client Owners . . . . . . . . . . 22 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 25 2.4. Client Identifiers and Client Owners . . . . . . . . . . 23
2.4.2. Server Release of Client ID . . . . . . . . . . . . 26 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 26
2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 26 2.4.2. Server Release of Client ID . . . . . . . . . . . . 27
2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 27 2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27
2.6. Security Service Negotiation . . . . . . . . . . . . . . 28 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 28
2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 28 2.6. Security Service Negotiation . . . . . . . . . . . . . . 29
2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 28 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 29
2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 29 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 29
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 32 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30
2.8. Non-RPC-based Security Services . . . . . . . . . . . . 35 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 33
2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 35 2.8. Non-RPC-based Security Services . . . . . . . . . . . . 36
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 35 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 36
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 35 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 36
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 36
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 36 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 36
2.9.1. Required and Recommended Properties of Transports . 36 2.9.1. Required and Recommended Properties of Transports . 36
2.9.2. Client and Server Transport Behavior . . . . . . . . 36 2.9.2. Client and Server Transport Behavior . . . . . . . . 37
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 38 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 38 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 38 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 39 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 41 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 42 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 45 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 57 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 58
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 60 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 65 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 66
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 69 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 70
2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 70 2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 71
2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 74 2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 75
3. Protocol Data Types . . . . . . . . . . . . . . . . . . . . . 74 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 75
3.1. Basic Data Types . . . . . . . . . . . . . . . . . . . . 74 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 75
3.2. Structured Data Types . . . . . . . . . . . . . . . . . 76 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 76
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 78
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 86 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 86 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 87
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 87 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 88
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 87 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 88
4.2.1. General Properties of a Filehandle . . . . . . . . . 87 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 88
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 88 4.2.1. General Properties of a Filehandle . . . . . . . . . 89
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 88 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 89
4.3. One Method of Constructing a Volatile Filehandle . . . . 90 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 90
4.4. Client Recovery from Filehandle Expiration . . . . . . . 90 4.3. One Method of Constructing a Volatile Filehandle . . . . 91
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 91 4.4. Client Recovery from Filehandle Expiration . . . . . . . 92
5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 92 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 92
5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 93 5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 94
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 93 5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 94
5.4. Classification of Attributes . . . . . . . . . . . . . . 94 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 94
5.5. Mandatory Attributes - List and Definition References . 95 5.4. Classification of Attributes . . . . . . . . . . . . . . 96
5.5. Mandatory Attributes - List and Definition References . 97
5.6. Recommended Attributes - List and Definition 5.6. Recommended Attributes - List and Definition
References . . . . . . . . . . . . . . . . . . . . . . . 95 References . . . . . . . . . . . . . . . . . . . . . . . 97
5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 96 5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 99
5.8. Interpreting owner and owner_group . . . . . . . . . . . 104 5.8. Interpreting owner and owner_group . . . . . . . . . . . 107
5.9. Character Case Attributes . . . . . . . . . . . . . . . 106 5.9. Character Case Attributes . . . . . . . . . . . . . . . 109
5.10. Directory Notification Attributes . . . . . . . . . . . 106 5.10. Directory Notification Attributes . . . . . . . . . . . 109
5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 107 5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 110
5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 108 5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 112
6. Security Related Attributes . . . . . . . . . . . . . . . . . 111 6. Security Related Attributes . . . . . . . . . . . . . . . . . 114
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 112 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 115
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 112 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 115
6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 127 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 130
6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 127 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 130
6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 127 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 131
6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 128 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 131
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 128 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 132
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 128 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 132
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 129 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 133
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 130 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 134
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 131 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 134
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 132 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 136
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 133 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 136
7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 136 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 140
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 137 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 140
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 137 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 141
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 137 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 141
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 138 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 142
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 138 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 142
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 139 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 142
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 139 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 143
7.8. Security Policy and Namespace Presentation . . . . . . . 139 7.8. Security Policy and Namespace Presentation . . . . . . . 143
8. State Management . . . . . . . . . . . . . . . . . . . . . . 141 8. State Management . . . . . . . . . . . . . . . . . . . . . . 144
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 141 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 145
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 142 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 145
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 142 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 146
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 143 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 147
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 144 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 148
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 145 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 150
8.2.5. Stateid Use for IO Operations . . . . . . . . . . . 148 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 153
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 149 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 153
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 150 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 155
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 150 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 155
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 151 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 156
8.4.3. Network Partitions and Recovery . . . . . . . . . . 155 8.4.3. Network Partitions and Recovery . . . . . . . . . . 159
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 159 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 164
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 160 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 165
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 161 Expiration . . . . . . . . . . . . . . . . . . . . . . . 165
8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 161 8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 166
9. File Locking and Share Reservations . . . . . . . . . . . . . 162 9. File Locking and Share Reservations . . . . . . . . . . . . . 167
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 163 9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 167
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 163 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 167
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 163 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 168
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 166 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 171
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 167 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 171
9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 167 9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 172
9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 168 9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 173
9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 169 9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 174
9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 170 9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 174
9.8. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 170 9.8. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 175
9.9. Reclaim of Open and Byte-range Locks . . . . . . . . . . 171 9.9. Reclaim of Open and Byte-range Locks . . . . . . . . . . 176
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 171 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 176
10.1. Performance Challenges for Client-Side Caching . . . . . 172 10.1. Performance Challenges for Client-Side Caching . . . . . 177
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 173 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 178
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 174 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 180
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 176 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 182
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 177 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 182
10.3.2. Data Caching and File Locking . . . . . . . . . . . 178 10.3.2. Data Caching and File Locking . . . . . . . . . . . 183
10.3.3. Data Caching and Mandatory File Locking . . . . . . 179 10.3.3. Data Caching and Mandatory File Locking . . . . . . 185
10.3.4. Data Caching and File Identity . . . . . . . . . . . 180 10.3.4. Data Caching and File Identity . . . . . . . . . . . 185
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 181 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 187
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 183 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 189
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 184 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 190
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 185 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 191
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 188 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 194
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 190 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 195
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 190 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 196
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 191 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 197
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 192 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 197
10.5.1. Revocation Recovery for Write Open Delegation . . . 192 10.5.1. Revocation Recovery for Write Open Delegation . . . 198
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 193 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 199
10.7. Data and Metadata Caching and Memory Mapped Files . . . 195 10.7. Data and Metadata Caching and Memory Mapped Files . . . 201
10.8. Name and Directory Caching without Directory 10.8. Name and Directory Caching without Directory
Delegations . . . . . . . . . . . . . . . . . . . . . . 197 Delegations . . . . . . . . . . . . . . . . . . . . . . 203
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 197 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 203
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 199 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 205
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 200 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 205
10.9.1. Introduction to Directory Delegations . . . . . . . 200 10.9.1. Introduction to Directory Delegations . . . . . . . 206
10.9.2. Directory Delegation Design . . . . . . . . . . . . 201 10.9.2. Directory Delegation Design . . . . . . . . . . . . 207
10.9.3. Attributes in Support of Directory Notifications . . 202 10.9.3. Attributes in Support of Directory Notifications . . 208
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 202 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 208
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 202 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 208
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 202 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 209
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 203 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 209
11.2. File System Presence or Absence . . . . . . . . . . . . 203 11.2. File System Presence or Absence . . . . . . . . . . . . 209
11.3. Getting Attributes for an Absent File System . . . . . . 204 11.3. Getting Attributes for an Absent File System . . . . . . 211
11.3.1. GETATTR Within an Absent File System . . . . . . . . 205 11.3.1. GETATTR Within an Absent File System . . . . . . . . 211
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 206 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 212
11.4. Uses of Location Information . . . . . . . . . . . . . . 206 11.4. Uses of Location Information . . . . . . . . . . . . . . 213
11.4.1. File System Replication . . . . . . . . . . . . . . 207 11.4.1. File System Replication . . . . . . . . . . . . . . 213
11.4.2. File System Migration . . . . . . . . . . . . . . . 208 11.4.2. File System Migration . . . . . . . . . . . . . . . 214
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 209 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 215
11.5. Additional Client-side Considerations . . . . . . . . . 211 11.5. Location Entries and Server Identity . . . . . . . . . . 217
11.6. Effecting File System Transitions . . . . . . . . . . . 211 11.6. Additional Client-side Considerations . . . . . . . . . 217
11.6.1. File System Transitions and Simultaneous Access . . 213 11.7. Effecting File System Transitions . . . . . . . . . . . 218
11.6.2. Simultaneous Use and Transparent Transitions . . . . 213 11.7.1. File System Transitions and Simultaneous Access . . 219
11.6.3. Filehandles and File System Transitions . . . . . . 216 11.7.2. Simultaneous Use and Transparent Transitions . . . . 220
11.6.4. Fileids and File System Transitions . . . . . . . . 216 11.7.3. Filehandles and File System Transitions . . . . . . 223
11.6.5. Fsids and File System Transitions . . . . . . . . . 217 11.7.4. Fileids and File System Transitions . . . . . . . . 223
11.6.6. The Change Attribute and File System Transitions . . 218 11.7.5. Fsids and File System Transitions . . . . . . . . . 224
11.6.7. Lock State and File System Transitions . . . . . . . 219 11.7.6. The Change Attribute and File System Transitions . . 225
11.6.8. Write Verifiers and File System Transitions . . . . 222 11.7.7. Lock State and File System Transitions . . . . . . . 226
11.6.9. Readdir Cookies and Verifiers and File System 11.7.8. Write Verifiers and File System Transitions . . . . 229
Transitions . . . . . . . . . . . . . . . . . . . . 223 11.7.9. Readdir Cookies and Verifiers and File System
11.6.10. File System Data and File System Transitions . . . . 223 Transitions . . . . . . . . . . . . . . . . . . . . 230
11.7. Effecting File System Referrals . . . . . . . . . . . . 225 11.7.10. File System Data and File System Transitions . . . . 230
11.7.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 225 11.8. Effecting File System Referrals . . . . . . . . . . . . 231
11.7.2. Referral Example (READDIR) . . . . . . . . . . . . . 229 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 232
11.8. The Attribute fs_locations . . . . . . . . . . . . . . . 231 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 236
11.9. The Attribute fs_locations_info . . . . . . . . . . . . 233 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 238
11.9.1. The fs_locations_server4 Structure . . . . . . . . . 237 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 240
11.9.2. The fs_locations_info4 Structure . . . . . . . . . . 242 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 244
11.9.3. The fs_locations_item4 Structure . . . . . . . . . . 243 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 249
11.10. The Attribute fs_status . . . . . . . . . . . . . . . . 245 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 250
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 249 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 252
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 249 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 256
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 251 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 256
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 251 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 257
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 251 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 258
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 252 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 258
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 252 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 258
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 252 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 258
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 252 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 258
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 252 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 258
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 253 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 259
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 253 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 259
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 254 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 260
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 255 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 260
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 256 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 262
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 256 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 263
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 256 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 263
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 257 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 263
12.5.3. Committing a Layout . . . . . . . . . . . . . . . . 258 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 264
12.5.4. Recalling a Layout . . . . . . . . . . . . . . . . . 261 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 265
12.5.5. Metadata Server Write Propagation . . . . . . . . . 267 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 266
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 267 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 269
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 269 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 276
12.7.1. Client Recovery . . . . . . . . . . . . . . . . . . 269 12.5.7. Metadata Server Write Propagation . . . . . . . . . 276
12.7.2. Dealing with Lease Expiration on the Client . . . . 269 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 276
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 278
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 278
12.7.2. Dealing with Lease Expiration on the Client . . . . 278
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 271 Server . . . . . . . . . . . . . . . . . . . . . . . 279
12.7.4. Recovery from Metadata Server Restart . . . . . . . 271 12.7.4. Recovery from Metadata Server Restart . . . . . . . 280
12.7.5. Operations During Metadata Server Grace Period . . . 273 12.7.5. Operations During Metadata Server Grace Period . . . 282
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 274 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 282
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 274 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 283
12.9. Security Considerations . . . . . . . . . . . . . . . . 276 12.9. Security Considerations for pNFS . . . . . . . . . . . . 284
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 277 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 285
13.1. Client ID and Session Considerations . . . . . . . . . . 277 13.1. Client ID and Session Considerations . . . . . . . . . . 285
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 278 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 287
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 279 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 287
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 282 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 291
13.4.1. Interpreting the File Layout Using Sparse Packing . 283 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 291
13.4.2. Interpreting the File Layout Using Dense Packing . . 285 13.4.2. Interpreting the File Layout Using Sparse Packing . 292
13.5. Sparse and Dense Stripe Unit Packing . . . . . . . . . . 287 13.4.3. Interpreting the File Layout Using Dense Packing . . 294
13.6. Data Server Multipathing . . . . . . . . . . . . . . . . 289 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 296
13.7. Operations Issued to NFSv4.1 Data Servers . . . . . . . 290 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 298
13.8. COMMIT Through Metadata Server . . . . . . . . . . . . . 292 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 299
13.9. The Layout Iomode . . . . . . . . . . . . . . . . . . . 293 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 301
13.10. Metadata and Data Server State Coordination . . . . . . 293 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 303
13.10.1. Global Stateid Requirements . . . . . . . . . . . . 293 13.9. Metadata and Data Server State Coordination . . . . . . 303
13.10.2. Data Server State Propagation . . . . . . . . . . . 295 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 303
13.11. Data Server Component File Size . . . . . . . . . . . . 297 13.9.2. Data Server State Propagation . . . . . . . . . . . 304
13.12. Recovery from Loss of Layout . . . . . . . . . . . . . . 297 13.10. Data Server Component File Size . . . . . . . . . . . . 306
13.13. Security Considerations for the File Layout Type . . . . 298 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 307
14. Internationalization . . . . . . . . . . . . . . . . . . . . 298 13.12. Security Considerations for the File Layout Type . . . . 307
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 300 14. Internationalization . . . . . . . . . . . . . . . . . . . . 308
14.2. Stringprep profile for the utf8str_cis type . . . . . . 301 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 309
14.3. Stringprep profile for the utf8str_mixed type . . . . . 303 14.2. Stringprep profile for the utf8str_cis type . . . . . . 311
14.4. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 304 14.3. Stringprep profile for the utf8str_mixed type . . . . . 312
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 304 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 314
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 305 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 314
15.2. Operations and their valid errors . . . . . . . . . . . 321 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 315
15.3. Callback operations and their valid errors . . . . . . . 337 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 315
15.4. Errors and the operations that use them . . . . . . . . 339 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 317
16. NFS version 4.1 Procedures . . . . . . . . . . . . . . . . . 353 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 319
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 353 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 320
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 354 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 322
17. Operations: mandatory or optional . . . . . . . . . . . . . . 364 15.1.5. State Management Errors . . . . . . . . . . . . . . 324
18. NFS version 4.1 Operations . . . . . . . . . . . . . . . . . 367 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 325
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 367 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 325
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 370 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 326
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 371 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 327
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 373 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 328
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 329
15.1.12. Session Management Errors . . . . . . . . . . . . . 330
15.1.13. Client Management Errors . . . . . . . . . . . . . . 331
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 332
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 332
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 333
15.2. Operations and their valid errors . . . . . . . . . . . 334
15.3. Callback operations and their valid errors . . . . . . . 350
15.4. Errors and the operations that use them . . . . . . . . 352
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 366
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 366
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 367
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 377
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 380
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 380
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 383
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 384
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 387
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 376 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 390
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 377 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 391
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 377 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 391
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 379 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 393
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 379 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 394
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 381 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 396
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 385 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 400
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 386 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 402
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 388 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 403
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 389 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 405
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 390 Attributes . . . . . . . . . . . . . . . . . . . . . . . 406
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 391 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 407
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 407 Directory . . . . . . . . . . . . . . . . . . . . . . . 426
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 408 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 427
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 409 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 428
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 410 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 429
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 411 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 431
18.22. Operation 25: READ - Read from File . . . . . . . . . . 411 18.22. Operation 25: READ - Read from File . . . . . . . . . . 431
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 413 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 434
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 417 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 437
18.25. Operation 28: REMOVE - Remove File System Object . . . . 418 18.25. Operation 28: REMOVE - Remove File System Object . . . . 438
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 420 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 441
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 421 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 444
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 422 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 445
18.29. Operation 33: SECINFO - Obtain Available Security . . . 423 18.29. Operation 33: SECINFO - Obtain Available Security . . . 446
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 426 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 449
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 429 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 452
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 430 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 453
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 434 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 458
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 435 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 459
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 438 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 462
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 454 Confirm Client ID . . . . . . . . . . . . . . . . . . . 478
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 464 session . . . . . . . . . . . . . . . . . . . . . . . . 487
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 466 locks . . . . . . . . . . . . . . . . . . . . . . . . . 489
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 466 delegation . . . . . . . . . . . . . . . . . . . . . . . 490
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 471 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 494
18.41. Operation 48: GETDEVICELIST . . . . . . . . . . . . . . 472 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings . 496
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 474 a layout . . . . . . . . . . . . . . . . . . . . . . . . 498
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 478 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 502
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 481 Information . . . . . . . . . . . . . . . . . . . . . . 506
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 485 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 510
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 486 sequencing and control . . . . . . . . . . . . . . . . . 511
18.47. Operation 54: SET_SSV . . . . . . . . . . . . . . . . . 492 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 517
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 494 validity . . . . . . . . . . . . . . . . . . . . . . . . 519
18.49. Operation 56: WANT_DELEGATION . . . . . . . . . . . . . 496 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 521
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 500 client ID . . . . . . . . . . . . . . . . . . . . . . . 525
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 500 Finished . . . . . . . . . . . . . . . . . . . . . . . . 525
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 502 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 528
19. NFS version 4.1 Callback Procedures . . . . . . . . . . . . . 503 19. NFSv44.1 Callback Procedures . . . . . . . . . . . . . . . . 528
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 503 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 529
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 504 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 529
20. NFS version 4.1 Callback Operations . . . . . . . . . . . . . 508 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 533
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 508 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 533
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 509 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 534
20.3. Operation 5: CB_LAYOUTRECALL . . . . . . . . . . . . . . 510 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 513 Client . . . . . . . . . . . . . . . . . . . . . . . . . 535
20.5. Operation 7: CB_PUSH_DELEG . . . . . . . . . . . . . . . 517 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 539
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 518 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL . . . . . . . . . . 520 Client . . . . . . . . . . . . . . . . . . . . . . . . . 543
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 544
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 546
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 521 limits . . . . . . . . . . . . . . . . . . . . . . . . . 547
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 522 sequencing and control . . . . . . . . . . . . . . . . . 548
20.10. Operation 12: CB_WANTS_CANCELLED . . . . . . . . . . . . 524 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 550
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 525 lock availability . . . . . . . . . . . . . . . . . . . 551
20.12. Operation 10044: CB_ILLEGAL - Illegal Callback 20.12. Operation 6: CB_NOTIFY_DEVICEID - Notify directory
Operation . . . . . . . . . . . . . . . . . . . . . . . 527 changes . . . . . . . . . . . . . . . . . . . . . . . . 553
21. Security Considerations . . . . . . . . . . . . . . . . . . . 527 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 528 Operation . . . . . . . . . . . . . . . . . . . . . . . 555
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 528 21. Security Considerations . . . . . . . . . . . . . . . . . . . 555
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 528 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 557
22.3. Defining New Notifications . . . . . . . . . . . . . . . 529 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 557
22.4. Defining new layout types . . . . . . . . . . . . . . . 530 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 557
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 531 22.3. Defining New Notifications . . . . . . . . . . . . . . . 559
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 531 22.4. Defining new layout types . . . . . . . . . . . . . . . 559
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 531 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 560
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 532 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 560
23.1. Normative References . . . . . . . . . . . . . . . . . . 532 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 561
23.2. Informative References . . . . . . . . . . . . . . . . . 533 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 561
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 535 23.1. Normative References . . . . . . . . . . . . . . . . . . 561
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 536 23.2. Informative References . . . . . . . . . . . . . . . . . 562
Intellectual Property and Copyright Statements . . . . . . . . . 538 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 564
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 566
Intellectual Property and Copyright Statements . . . . . . . . . 567
1. Introduction 1. Introduction
1.1. The NFSv4.1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFSv4.1 protocol is a minor version of the NFSv4 protocol The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
described in [20]. It generally follows the guidelines for minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
versioning model laid in Section 10 of RFC 3530. However, it version, NFSv4.0 is described in [20]. It generally follows the
diverges from guidelines 11 ("a client and server that supports minor guidelines for minor versioning model listed in Section 10 of RFC
version X must support minor versions 0 through X-1"), and 12 ("no 3530. However, it diverges from guidelines 11 ("a client and server
features may be introduced as mandatory in a minor version"). These that supports minor version X must support minor versions 0 through
divergences are due to the introduction of the sessions model for X-1"), and 12 ("no features may be introduced as mandatory in a minor
managing non-idempotent operations and the RECLAIM_COMPLETE version"). These divergences are due to the introduction of the
operation. These two new features are infrastructural in nature and sessions model for managing non-idempotent operations and the
simplify implementation of existing and other new features. Making RECLAIM_COMPLETE operation. These two new features are
them optional would add undue complexity to protocol definition and infrastructural in nature and simplify implementation of existing and
implementation. NFSv4.1 accordingly updates the Minor Versioning other new features. Making them optional would add undue complexity
guidelines (Section 2.7). to protocol definition and implementation. NFSv4.1 accordingly
updates the Minor Versioning guidelines (Section 2.7).
NFSv4.1, as a minor version, is consistent with the overall goals for As a minor version, NFSv4.1 is consistent with the overall goals for
NFS Version 4, but extends the protocol so as to better meet those NFSv4, but extends the protocol so as to better meet those goals,
goals, based on experiences with NFSv4.0. In addition, NFSv4.1 has based on experiences with NFSv4.0. In addition, NFSv4.1 has adopted
adopted some additional goals, which motivate some of the major some additional goals, which motivate some of the major extensions in
extensions in minor version 1. NFSv4.1.
1.2. NFS Version 4 Goals 1.2. Scope of this Document
The NFS version 4 protocol is a further revision of the NFS protocol This document describes the NFSv4.1 protocol. With respect to
defined already by versions 2 [21] and 3 [22]. It retains the NFSv4.0, this document does not:
essential characteristics of previous versions: design for easy
recovery, independent of transport protocols, operating systems and o describe the NFSv4.0 protocol, except where needed to contrast
file systems, simplicity, and good performance. The NFS version 4 with NFSv4.1.
revision has the following goals:
o modify the specification of the NFSv4.0 protocol.
o clarify the NFSv4.0 protocol.
1.3. NFSv4 Goals
The NFSv4 protocol is a further revision of the NFS protocol defined
already by NFSv3 [21]. It retains the essential characteristics of
previous versions: design for easy recovery, independent of transport
protocols, operating systems and file systems, simplicity, and good
performance. NFSv4 has the following goals:
o Improved access and good performance on the Internet. o Improved access and good performance on the Internet.
The protocol is designed to transit firewalls easily, perform well The protocol is designed to transit firewalls easily, perform well
where latency is high and bandwidth is low, and scale to very where latency is high and bandwidth is low, and scale to very
large numbers of clients per server. large numbers of clients per server.
o Strong security with negotiation built into the protocol. o Strong security with negotiation built into the protocol.
The protocol builds on the work of the ONCRPC working group in The protocol builds on the work of the ONCRPC working group in
supporting the RPCSEC_GSS protocol. Additionally, the NFS version supporting the RPCSEC_GSS protocol. Additionally, the NFSv4.1
4 protocol provides a mechanism to allow clients and servers the protocol provides a mechanism to allow clients and servers the
ability to negotiate security and require clients and servers to ability to negotiate security and require clients and servers to
support a minimal set of security schemes. support a minimal set of security schemes.
o Good cross-platform interoperability. o Good cross-platform interoperability.
The protocol features a file system model that provides a useful, The protocol features a file system model that provides a useful,
common set of features that does not unduly favor one file system common set of features that does not unduly favor one file system
or operating system over another. or operating system over another.
o Designed for protocol extensions. o Designed for protocol extensions.
The protocol is designed to accept standard extensions within a The protocol is designed to accept standard extensions within a
framework that enable and encourages backward compatibility. framework that enable and encourages backward compatibility.
1.3. Minor Version 1 Goals 1.4. NFSv4.1 Goals
Minor version one has the following goals, within the framework NFSv4.1 has the following goals, within the framework established by
established by the overall version 4 goals. the overall NFSv4 goals.
o To correct significant structural weaknesses and oversights o To correct significant structural weaknesses and oversights
discovered in the base protocol. discovered in the base protocol.
o To add clarity and specificity to areas left unaddressed or not o To add clarity and specificity to areas left unaddressed or not
addressed in sufficient detail in the base protocol. addressed in sufficient detail in the base protocol.
o To add specific features based on experience with the existing o To add specific features based on experience with the existing
protocol and recent industry developments. protocol and recent industry developments.
o To provide protocol support to take advantage of clustered server o To provide protocol support to take advantage of clustered server
deployments including the ability to provide scalable parallel deployments including the ability to provide scalable parallel
access to files distributed among multiple servers. access to files distributed among multiple servers.
1.4. Overview of NFS version 4.1 Features 1.5. Overview of NFSv4.1 Features
To provide a reasonable context for the reader, the major features of To provide a reasonable context for the reader, the major features of
NFS version 4.1 protocol will be reviewed in brief. This will be the NFSv4.1 protocol will be reviewed in brief. This will be done to
done to provide an appropriate context for both the reader who is provide an appropriate context for both the reader who is familiar
familiar with the previous versions of the NFS protocol and the with the previous versions of the NFS protocol and the reader that is
reader that is new to the NFS protocols. For the reader new to the new to the NFS protocols. For the reader new to the NFS protocols,
NFS protocols, there is still a set of fundamental knowledge that is there is still a set of fundamental knowledge that is expected. The
expected. The reader should be familiar with the XDR and RPC reader should be familiar with the XDR and RPC protocols as described
protocols as described in [2] and [3]. A basic knowledge of file in [2] and [3]. A basic knowledge of file systems and distributed
systems and distributed file systems is expected as well. file systems is expected as well.
This description of version 4.1 features will not distinguish those In general this specification of NFSv4.1 will not distinguish those
added in minor version one from those present in the base protocol added in minor version one from those present in the base protocol
but will treat minor version 1 as a unified whole. See Section 1.6 but will treat NFSv4.1 as a unified whole. See Section 1.7 for a
for a description of the differences between the two minor versions. summary of the differences between NFSv4.0 and NFSv4.1.
1.4.1. RPC and Security 1.5.1. RPC and Security
As with previous versions of NFS, the External Data Representation As with previous versions of NFS, the External Data Representation
(XDR) and Remote Procedure Call (RPC) mechanisms used for the NFS (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4.1
version 4.1 protocol are those defined in [2] and [3]. To meet end- protocol are those defined in [2] and [3]. To meet end-to-end
to-end security requirements, the RPCSEC_GSS framework [4] will be security requirements, the RPCSEC_GSS framework [4] will be used to
used to extend the basic RPC security. With the use of RPCSEC_GSS, extend the basic RPC security. With the use of RPCSEC_GSS, various
various mechanisms can be provided to offer authentication, mechanisms can be provided to offer authentication, integrity, and
integrity, and privacy to the NFS version 4 protocol. Kerberos V5 privacy to the NFSv4 protocol. Kerberos V5 will be used as described
will be used as described in [5] to provide one security framework. in [5] to provide one security framework. The LIPKEY and SPKM-3 GSS-
The LIPKEY and SPKM-3 GSS-API mechanisms described in [6] will be API mechanisms described in [6] will be used to provide for the use
used to provide for the use of user password and client/server public of user password and client/server public key certificates by the
key certificates by the NFS version 4 protocol. With the use of NFSv4 protocol. With the use of RPCSEC_GSS, other mechanisms may
RPCSEC_GSS, other mechanisms may also be specified and used for NFS also be specified and used for NFSv4.1 security.
version 4.1 security.
To enable in-band security negotiation, the NFS version 4.1 protocol To enable in-band security negotiation, the NFSv4.1 protocol has
has operations which provide the client a method of querying the operations which provide the client a method of querying the server
server about its policies regarding which security mechanisms must be about its policies regarding which security mechanisms must be used
used for access to the server's file system resources. With this, for access to the server's file system resources. With this, the
the client can securely match the security mechanism that meets the client can securely match the security mechanism that meets the
policies specified at both the client and server. policies specified at both the client and server.
1.4.2. Protocol Structure 1.5.2. Protocol Structure
1.4.2.1. Core Protocol 1.5.2.1. Core Protocol
Unlike NFS Versions 2 and 3, which used a series of ancillary Unlike NFSv3, which used a series of ancillary protocols (e.g. NLM,
protocols (e.g. NLM, NSM, MOUNT), within all minor versions of NFS NSM, MOUNT), within all minor versions of NFSv4 a single RPC protocol
version 4 only a single RPC protocol is used to make requests of the is used to make requests to the server. Facilities that had been
server. Facilities that had been separate protocols, such as separate protocols, such as locking, are now integrated within a
locking, are now integrated within a single unified protocol. single unified protocol.
1.4.2.2. Parallel Access 1.5.2.2. Parallel Access
Minor version one supports high-performance data access to a Minor version one supports high-performance data access to a
clustered server implementation by enabling a separation of metadata clustered server implementation by enabling a separation of metadata
access and data access, with the latter done to multiple servers in access and data access, with the latter done to multiple servers in
parallel. parallel.
Such parallel data access is controlled by recallable objects known Such parallel data access is controlled by recallable objects known
as "layouts", which are integrated into the protocol locking model. as "layouts", which are integrated into the protocol locking model.
Clients direct requests for data access to a set of data servers Clients direct requests for data access to a set of data servers
specified by the layout via a data storage protocol which may be specified by the layout via a data storage protocol which may be
NFSv4.1 or may be another protocol. NFSv4.1 or may be another protocol.
1.4.3. File System Model 1.5.3. File System Model
The general file system model used for the NFS version 4.1 protocol The general file system model used for the NFSv4.1 protocol is the
is the same as previous versions. The server file system is same as previous versions. The server file system is hierarchical
hierarchical with the regular files contained within being treated as with the regular files contained within being treated as opaque byte
opaque byte streams. In a slight departure, file and directory names streams. In a slight departure, file and directory names are encoded
are encoded with UTF-8 to deal with the basics of with UTF-8 to deal with the basics of internationalization.
internationalization.
The NFS version 4.1 protocol does not require a separate protocol to The NFSv4.1 protocol does not require a separate protocol to provide
provide for the initial mapping between path name and filehandle. for the initial mapping between path name and filehandle. All file
All file systems exported by a server are presented as a tree so that systems exported by a server are presented as a tree so that all file
all file systems are reachable from a special per-server global root systems are reachable from a special per-server global root
filehandle. This allows LOOKUP operations to be used to perform filehandle. This allows LOOKUP operations to be used to perform
functions previously provided by the MOUNT protocol. The server functions previously provided by the MOUNT protocol. The server
provides any necessary pseudo file systems to bridge any gaps that provides any necessary pseudo file systems to bridge any gaps that
arise due to unexported gaps between exported file systems. arise due to unexported gaps between exported file systems.
1.4.3.1. Filehandles 1.5.3.1. Filehandles
As in previous versions of the NFS protocol, opaque filehandles are As in previous versions of the NFS protocol, opaque filehandles are
used to identify individual files and directories. Lookup-type and used to identify individual files and directories. Lookup-type and
create operations are used to go from file and directory names to the create operations are used to go from file and directory names to the
filehandle which is then used to identify the object to subsequent filehandle which is then used to identify the object to subsequent
operations. operations.
The NFS version 4.1 protocol provides support for persistent The NFSv4.1 protocol provides support for persistent filehandles,
filehandles, guaranteed to be valid for the lifetime of the file guaranteed to be valid for the lifetime of the file system object
system object designated. In addition it provides support to servers designated. In addition it provides support to servers to provide
to provide filehandles with more limited validity guarantees, called filehandles with more limited validity guarantees, called volatile
volatile filehandles. filehandles.
1.4.3.2. File Attributes 1.5.3.2. File Attributes
The NFS version 4.1 protocol has a rich and extensible attribute The NFSv4.1 protocol has a rich and extensible attribute structure.
structure. Only a small set of the defined attributes are mandatory Only a small set of the defined attributes are mandatory and must be
and must be provided by all server implementations. The other provided by all server implementations. The other attributes are
attributes are known as "recommended" attributes. known as "recommended" attributes.
The acl, sacl, and dacl attributes are a significant set of file The acl, sacl, and dacl attributes are a significant set of file
attributes that make up the Access Control List (ACL) of a file. attributes that make up the Access Control List (ACL) of a file.
These attributes provide for directory and file access control beyond These attributes provide for directory and file access control beyond
the model used in NFS Versions 2 and 3. The ACL definition allows the model used in NFSv3. The ACL definition allows for specification
for specification of specific sets of permissions for individual of specific sets of permissions for individual users and groups. In
users and groups. In addition, ACL inheritance allows propagation of addition, ACL inheritance allows propagation of access permissions
access permissions and restriction down a directory tree as file and restriction down a directory tree as file system objects are
system objects are created. created.
One other type of attribute is the named attribute. A named One other type of attribute is the named attribute. A named
attribute is an opaque byte stream that is associated with a attribute is an opaque byte stream that is associated with a
directory or file and referred to by a string name. Named attributes directory or file and referred to by a string name. Named attributes
are meant to be used by client applications as a method to associate are meant to be used by client applications as a method to associate
application-specific data with a regular file or directory. application-specific data with a regular file or directory. NFSv4.1
modifies named attributes relative to NFSv4.0 by tightening the
allowed operations in order to prevent the development of non-
interoperable implementation. See Section 5.3 for details.
1.4.3.3. Multi-server Namespace 1.5.3.3. Multi-server Namespace
NFS Version 4.1 contains a number of features to allow implementation NFSv4.1 contains a number of features to allow implementation of
of namespaces that cross server boundaries and that allow and namespaces that cross server boundaries and that allow and facilitate
facilitate a non-disruptive transfer of support for individual file a non-disruptive transfer of support for individual file systems
systems between servers. They are all based upon attributes that between servers. They are all based upon attributes that allow one
allow one file system to specify alternate or new locations for that file system to specify alternate or new locations for that file
file system. system.
These attributes may be used together with the concept of absent file These attributes may be used together with the concept of absent file
system which provide specifications for additional locations but no system which provide specifications for additional locations but no
actual file system content. This allows a number of important actual file system content. This allows a number of important
facilities: facilities:
o Location attributes may be used with absent file systems to o Location attributes may be used with absent file systems to
implement referrals whereby one server may direct the client to a implement referrals whereby one server may direct the client to a
file system provided by another server. This allows extensive file system provided by another server. This allows extensive
multi-server namespaces to be constructed. multi-server namespaces to be constructed.
o Location attributes may be provided for present file systems to o Location attributes may be provided for present file systems to
provide the locations of alternate file system instances or provide the locations of alternate file system instances or
replicas to be used in the event that the current file system replicas to be used in the event that the current file system
instance becomes unavailable. instance becomes unavailable.
o Location attributes may be provided when a previously present file o Location attributes may be provided when a previously present file
system becomes absent. This allows non-disruptive migration of system becomes absent. This allows non-disruptive migration of
file systems to alternate servers. file systems to alternate servers.
1.4.4. Locking Facilities 1.5.4. Locking Facilities
As mentioned previously, NFS v4.1 is a single protocol which includes As mentioned previously, NFS v4.1 is a single protocol which includes
locking facilities. These locking facilities include support for locking facilities. These locking facilities include support for
many types of locks including a number of sorts of recallable locks. many types of locks including a number of sorts of recallable locks.
Recallable locks such as delegations allow the client to be assured Recallable locks such as delegations allow the client to be assured
that certain events will not occur so long as that lock is held. that certain events will not occur so long as that lock is held.
When circumstances change, the lock is recalled via a callback When circumstances change, the lock is recalled via a callback
request. The assurances provided by delegations allow more extensive request. The assurances provided by delegations allow more extensive
caching to be done safely when circumstances allow it. caching to be done safely when circumstances allow it.
skipping to change at page 15, line 29 skipping to change at page 16, line 35
client and that no change to the data's location inconsistent with client and that no change to the data's location inconsistent with
that access may be made so long as the layout is held. that access may be made so long as the layout is held.
All locks for a given client are tied together under a single client- All locks for a given client are tied together under a single client-
wide lease. All requests made on sessions associated with the client wide lease. All requests made on sessions associated with the client
renew that lease. When leases are not promptly renewed locks are renew that lease. When leases are not promptly renewed locks are
subject to revocation. In the event of server reboot, clients have subject to revocation. In the event of server reboot, clients have
the opportunity to safely reclaim their locks within a special grace the opportunity to safely reclaim their locks within a special grace
period. period.
1.5. General Definitions 1.6. General Definitions
The following definitions are provided for the purpose of providing The following definitions are provided for the purpose of providing
an appropriate context for the reader. an appropriate context for the reader.
Byte This document defines a byte as an octet, i.e. a datum exactly Byte This document defines a byte as an octet, i.e. a datum exactly
8 bits in length. 8 bits in length.
Client The "client" is the entity that accesses the NFS server's Client The "client" is the entity that accesses the NFS server's
resources. The client may be an application which contains the resources. The client may be an application which contains the
logic to access the NFS server directly. The client may also be logic to access the NFS server directly. The client may also be
skipping to change at page 16, line 44 skipping to change at page 18, line 5
Server Owner The "Server Owner" identifies the server to the client. Server Owner The "Server Owner" identifies the server to the client.
The server owner consists of a major and minor identifier. When The server owner consists of a major and minor identifier. When
the client has two connections each to a peer with the same major the client has two connections each to a peer with the same major
identifier, the client assumes both peers are the same server (the identifier, the client assumes both peers are the same server (the
server namespace is the same via each connection), and assumes and server namespace is the same via each connection), and assumes and
lock state is sharable across both connections. When each peer lock state is sharable across both connections. When each peer
both the same major and minor identifier, the client assumes each both the same major and minor identifier, the client assumes each
connection might be associatable with the same session. connection might be associatable with the same session.
Stable Storage NFS version 4 servers must be able to recover without Stable Storage NFSv4.1 servers must be able to recover without data
data loss from multiple power failures (including cascading power loss from multiple power failures (including cascading power
failures, that is, several power failures in quick succession), failures, that is, several power failures in quick succession),
operating system failures, and hardware failure of components operating system failures, and hardware failure of components
other than the storage medium itself (for example, disk, other than the storage medium itself (for example, disk,
nonvolatile RAM). nonvolatile RAM).
Some examples of stable storage that are allowable for an NFS Some examples of stable storage that are allowable for an NFS
server include: server include:
1. Media commit of data, that is, the modified data has been 1. Media commit of data, that is, the modified data has been
successfully written to the disk media, for example, the disk successfully written to the disk media, for example, the disk
skipping to change at page 17, line 26 skipping to change at page 18, line 36
recovery software. recovery software.
Stateid A 128-bit quantity returned by a server that uniquely Stateid A 128-bit quantity returned by a server that uniquely
defines the open and locking state provided by the server for a defines the open and locking state provided by the server for a
specific open or lock owner for a specific file and type of lock. specific open or lock owner for a specific file and type of lock.
Verifier A 64-bit quantity generated by the client that the server Verifier A 64-bit quantity generated by the client that the server
can use to determine if the client has restarted and lost all can use to determine if the client has restarted and lost all
previous lock state. previous lock state.
1.6. Differences from NFSv4.0 1.7. Differences from NFSv4.0
The following summarizes the differences between minor version one The following summarizes the differences between minor version one
and the base protocol: and the base protocol:
o Implementation of the sessions model. o Implementation of the sessions model.
o Support for parallel access to data. o Support for parallel access to data.
o Addition of the RECLAIM_COMPLETE operation to better structure the o Addition of the RECLAIM_COMPLETE operation to better structure the
lock reclamation process. lock reclamation process.
skipping to change at page 17, line 49 skipping to change at page 19, line 11
addition to regular files. addition to regular files.
o Operations to re-obtain a delegation. o Operations to re-obtain a delegation.
o Support for client and server implementation id's. o Support for client and server implementation id's.
2. Core Infrastructure 2. Core Infrastructure
2.1. Introduction 2.1. Introduction
NFS version 4.1 (NFSv4.1) relies on core infrastructure common to NFSv4.1 relies on core infrastructure common to nearly every
nearly every operation. This core infrastructure is described in the operation. This core infrastructure is described in the remainder of
remainder of this section. this section.
2.2. RPC and XDR 2.2. RPC and XDR
The NFS version 4.1 (NFSv4.1) protocol is a Remote Procedure Call The NFSv4.1 protocol is a Remote Procedure Call (RPC) application
(RPC) application that uses RPC version 2 and the corresponding that uses RPC version 2 and the corresponding eXternal Data
eXternal Data Representation (XDR) as defined in [3] and [2]. Representation (XDR) as defined in [3] and [2].
2.2.1. RPC-based Security 2.2.1. RPC-based Security
Previous NFS versions have been thought of as having a host-based Previous NFS versions have been thought of as having a host-based
authentication model, where the NFS server authenticates the NFS authentication model, where the NFS server authenticates the NFS
client, and trust the client to authenticate all users. Actually, client, and trust the client to authenticate all users. Actually,
NFS has always depended on RPC for authentication. The first form of NFS has always depended on RPC for authentication. The first form of
RPC authentication which required a host-based authentication RPC authentication which required a host-based authentication
approach. NFSv4.1 also depends on RPC for basic security services, approach. NFSv4.1 also depends on RPC for basic security services,
and mandates RPC support for a user-based authentication model. The and mandates RPC support for a user-based authentication model. The
skipping to change at page 19, line 21 skipping to change at page 20, line 31
privacy and integrity services, GSS-API's authentication service is privacy and integrity services, GSS-API's authentication service is
not used for RPCSEC_GSS's authentication service. Instead, each RPC not used for RPCSEC_GSS's authentication service. Instead, each RPC
request and response header is integrity protected with the GSS-API request and response header is integrity protected with the GSS-API
integrity service, and this allows RPCSEC_GSS to offer per-RPC integrity service, and this allows RPCSEC_GSS to offer per-RPC
authentication and identity. See [4] for more information. authentication and identity. See [4] for more information.
NFSv4.1 client and servers MUST support RPCSEC_GSS's integrity and NFSv4.1 client and servers MUST support RPCSEC_GSS's integrity and
authentication service. NFSv4.1 servers MUST support RPCSEC_GSS's authentication service. NFSv4.1 servers MUST support RPCSEC_GSS's
privacy service. privacy service.
2.2.1.1.1.2. Security mechanisms for NFS version 4 2.2.1.1.1.2. Security mechanisms for NFSv4.1
RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide
security services. Therefore NFSv4.1 clients and servers MUST security services. Therefore NFSv4.1 clients and servers MUST
support three security mechanisms: Kerberos V5, SPKM-3, and LIPKEY. support three security mechanisms: Kerberos V5, SPKM-3, and LIPKEY.
The use of RPCSEC_GSS requires selection of: mechanism, quality of The use of RPCSEC_GSS requires selection of: mechanism, quality of
protection (QOP), and service (authentication, integrity, privacy). protection (QOP), and service (authentication, integrity, privacy).
For the mandated security mechanisms, NFSv4.1 specifies that a QOP of For the mandated security mechanisms, NFSv4.1 specifies that a QOP of
zero (0) is used, leaving it up to the mechanism or the mechanism's zero (0) is used, leaving it up to the mechanism or the mechanism's
configuration to use an appropriate level of protection that QOP zero configuration to use an appropriate level of protection that QOP zero
skipping to change at page 20, line 23 skipping to change at page 21, line 24
1 2 3 4 5 6 1 2 3 4 5 6
------------------------------------------------------------------ ------------------------------------------------------------------
390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes 390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes
390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes 390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes
390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes 390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes
Note that the number and name of the pseudo flavor is presented here Note that the number and name of the pseudo flavor is presented here
as a mapping aid to the implementor. Because the NFSv4.1 protocol as a mapping aid to the implementor. Because the NFSv4.1 protocol
includes a method to negotiate security and it understands the GSS- includes a method to negotiate security and it understands the GSS-
API mechanism, the pseudo flavor is not needed. The pseudo flavor is API mechanism, the pseudo flavor is not needed. The pseudo flavor is
needed for the NFS version 3 since the security negotiation is done needed for the NFSv3 since the security negotiation is done via the
via the MOUNT protocol as described in [23]. MOUNT protocol as described in [22].
2.2.1.1.1.2.2. LIPKEY 2.2.1.1.1.2.2. LIPKEY
The LIPKEY V5 GSS-API mechanism as described in [6] MUST be The LIPKEY V5 GSS-API mechanism as described in [6] MUST be
implemented with the RPCSEC_GSS services as specified in the implemented with the RPCSEC_GSS services as specified in the
following table: following table:
1 2 3 4 5 6 1 2 3 4 5 6
------------------------------------------------------------------ ------------------------------------------------------------------
390006 lipkey 1.3.6.1.5.5.9 rpc_gss_svc_none yes yes 390006 lipkey 1.3.6.1.5.5.9 rpc_gss_svc_none yes yes
skipping to change at page 21, line 27 skipping to change at page 22, line 27
Implementations of security mechanisms will convert nfs@hostname to Implementations of security mechanisms will convert nfs@hostname to
various different forms. For Kerberos V5, LIPKEY, and SPKM-3, the various different forms. For Kerberos V5, LIPKEY, and SPKM-3, the
following form is RECOMMENDED: following form is RECOMMENDED:
nfs/hostname nfs/hostname
2.3. COMPOUND and CB_COMPOUND 2.3. COMPOUND and CB_COMPOUND
A significant departure from the versions of the NFS protocol before A significant departure from the versions of the NFS protocol before
version 4 is the introduction of the COMPOUND procedure. For the NFSv4 is the introduction of the COMPOUND procedure. For the NFSv4
NFSv4 protocol, in all minor versions, there are exactly two RPC protocol, in all minor versions, there are exactly two RPC
procedures, NULL and COMPOUND. The COMPOUND procedure is defined as procedures, NULL and COMPOUND. The COMPOUND procedure is defined as
a series of individual operations and these operations perform the a series of individual operations and these operations perform the
sorts of functions performed by traditional NFS procedures. sorts of functions performed by traditional NFS procedures.
The operations combined within a COMPOUND request are evaluated in The operations combined within a COMPOUND request are evaluated in
order by the server, without any atomicity guarantees. A limited set order by the server, without any atomicity guarantees. A limited set
of facilities exist to pass results from one operation to another. of facilities exist to pass results from one operation to another.
Once an operation returns a failing result, the evaluation ends and Once an operation returns a failing result, the evaluation ends and
the results of all evaluated operations are returned to the client. the results of all evaluated operations are returned to the client.
skipping to change at page 21, line 50 skipping to change at page 22, line 50
simple or complex requests. These COMPOUND requests allow for a simple or complex requests. These COMPOUND requests allow for a
reduction in the number of RPCs needed for logical file system reduction in the number of RPCs needed for logical file system
operations. For example, multi-component lookup requests can be operations. For example, multi-component lookup requests can be
constructed by combining multiple LOOKUP operations. Those can be constructed by combining multiple LOOKUP operations. Those can be
further combined with operations such as GETATTR, READDIR, or OPEN further combined with operations such as GETATTR, READDIR, or OPEN
plus READ to do more complicated sets of operation without incurring plus READ to do more complicated sets of operation without incurring
additional latency. additional latency.
NFSv4.1 also contains a considerable set of callback operations in NFSv4.1 also contains a considerable set of callback operations in
which the server makes an RPC directed at the client. Callback RPC's which the server makes an RPC directed at the client. Callback RPC's
have a similar structure to that of the normal server requests. For have a similar structure to that of the normal server requests. In
the NFS version 4 protocol callbacks in all minor versions, there are all minor versions of the NFSv4 protocol there are two callback RPC
two RPC procedures, NULL and CB_COMPOUND. The CB_COMPOUND procedure procedures, NULL and CB_COMPOUND. The CB_COMPOUND procedure is
is defined in an analogous fashion to that of COMPOUND with its own defined in an analogous fashion to that of COMPOUND with its own set
set of callback operations. of callback operations.
Addition of new server and callback operation within the COMPOUND and Addition of new server and callback operation within the COMPOUND and
CB_COMPOUND request framework provide means of extending the protocol CB_COMPOUND request framework provide means of extending the protocol
in subsequent minor versions. in subsequent minor versions.
Except for a small number of operations needed for session creation, Except for a small number of operations needed for session creation,
server requests and callback requests are performed within the server requests and callback requests are performed within the
context of a session. Sessions provide a client context for every context of a session. Sessions provide a client context for every
request and support robust reply protection for non-idempotent request and support robust reply protection for non-idempotent
requests. requests.
skipping to change at page 22, line 30 skipping to change at page 23, line 30
specific client must be identifiable by the server. specific client must be identifiable by the server.
Each distinct client instance is represented by a client ID. A Each distinct client instance is represented by a client ID. A
client ID is a 64-bit identifier represents a specific client at a client ID is a 64-bit identifier represents a specific client at a
given time. The client ID is changed whenever the client re- given time. The client ID is changed whenever the client re-
initializes, and may change when the server re-initializes. Client initializes, and may change when the server re-initializes. Client
IDs are used to support lock identification and crash recovery. IDs are used to support lock identification and crash recovery.
During steady state operation, the client ID associated with each During steady state operation, the client ID associated with each
operation is derived from the session (see Section 2.10) on which the operation is derived from the session (see Section 2.10) on which the
operation is issued. A session is associated with a client ID when operation is sent. A session is associated with a client ID when the
the session is created. session is created.
Unlike NFSv4.0, the only NFSv4.1 operations possible before a client Unlike NFSv4.0, the only NFSv4.1 operations possible before a client
ID is established are those needed to establish the client ID. ID is established are those needed to establish the client ID.
A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION
operation using that client ID (eir_clientid as returned from operation using that client ID (eir_clientid as returned from
EXCHANGE_ID) is required to establish and confirm the client ID on EXCHANGE_ID) is required to establish and confirm the client ID on
the server. Establishment of identification by a new incarnation of the server. Establishment of identification by a new incarnation of
the client also has the effect of immediately releasing any locking the client also has the effect of immediately releasing any locking
state that a previous incarnation of that same client might have had state that a previous incarnation of that same client might have had
skipping to change at page 23, line 40 skipping to change at page 24, line 40
present the same string. The consequences of two clients present the same string. The consequences of two clients
presenting the same string range from one client getting an error presenting the same string range from one client getting an error
to one client having its leased state abruptly and unexpectedly to one client having its leased state abruptly and unexpectedly
canceled. canceled.
o The string should be selected so the subsequent incarnations (e.g. o The string should be selected so the subsequent incarnations (e.g.
restarts) of the same client cause the client to present the same restarts) of the same client cause the client to present the same
string. The implementor is cautioned from an approach that string. The implementor is cautioned from an approach that
requires the string to be recorded in a local file because this requires the string to be recorded in a local file because this
precludes the use of the implementation in an environment where precludes the use of the implementation in an environment where
there is no local disk and all file access is from an NFS version there is no local disk and all file access is from an NFSv4.1
4 server. server.
o The string should be the same for each server network address that o The string should be the same for each server network address that
the client accesses, (note: the precise opposite was advised in the client accesses, (note: the precise opposite was advised in
the NFSv4.0 specification [20]). This way, if a server has the NFSv4.0 specification [20]). This way, if a server has
multiple interfaces, the client can trunk traffic over multiple multiple interfaces, the client can trunk traffic over multiple
network paths as described in Section 2.10.4. network paths as described in Section 2.10.4.
o The algorithm for generating the string should not assume that the o The algorithm for generating the string should not assume that the
client's network address will not change, unless the client client's network address will not change, unless the client
implementation knows it is using statically assigned network implementation knows it is using statically assigned network
skipping to change at page 24, line 27 skipping to change at page 25, line 27
o If applicable, the client's statically assigned network address. o If applicable, the client's statically assigned network address.
o Additional information that tends to be unique, such as one or o Additional information that tends to be unique, such as one or
more of: more of:
* The client machine's serial number (for privacy reasons, it is * The client machine's serial number (for privacy reasons, it is
best to perform some one way function on the serial number). best to perform some one way function on the serial number).
* A MAC address (again, a one way function should be performed). * A MAC address (again, a one way function should be performed).
* The timestamp of when the NFS version 4 software was first * The timestamp of when the NFSv4.1 software was first installed
installed on the client (though this is subject to the on the client (though this is subject to the previously
previously mentioned caution about using information that is mentioned caution about using information that is stored in a
stored in a file, because the file might only be accessible file, because the file might only be accessible over NFSv4.1).
over NFS version 4).
* A true random number. However since this number ought to be * A true random number. However since this number ought to be
the same between client incarnations, this shares the same the same between client incarnations, this shares the same
problem as that of the using the timestamp of the software problem as that of the using the timestamp of the software
installation. installation.
o For a user level NFS version 4 client, it should contain o For a user level NFSv4.1 client, it should contain additional
additional information to distinguish the client from other user information to distinguish the client from other user level
level clients running on the same host, such as a process clients running on the same host, such as a process identifier or
identifier or other unique sequence. other unique sequence.
The client ID is assigned by the server (the eir_clientid result from The client ID is assigned by the server (the eir_clientid result from
EXCHANGE_ID) and should be chosen so that it will not conflict with a EXCHANGE_ID) and should be chosen so that it will not conflict with a
client ID previously assigned by the server. This applies across client ID previously assigned by the server. This applies across
server restarts. server restarts.
In the event of a server restart, a client may find out that its In the event of a server restart, a client may find out that its
current client ID is no longer valid when it receives a current client ID is no longer valid when it receives a
NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on
the characteristics of the sessions involved, specifically whether the characteristics of the sessions involved, specifically whether
skipping to change at page 25, line 23 skipping to change at page 26, line 22
NFS4ERR_BADSESSION, since the session in question was lost as part of NFS4ERR_BADSESSION, since the session in question was lost as part of
a server reboot. When the existing client ID is presented to a a server reboot. When the existing client ID is presented to a
server as part of creating a session and that client ID is not server as part of creating a session and that client ID is not
recognized, as would happen after a server restart, the server will recognized, as would happen after a server restart, the server will
reject the request with the error NFS4ERR_STALE_CLIENTID. reject the request with the error NFS4ERR_STALE_CLIENTID.
In the case of the session being persistent, the client will re- In the case of the session being persistent, the client will re-
establish communication using the existing session after the restart. establish communication using the existing session after the restart.
This session will be associated with the existing client ID but no This session will be associated with the existing client ID but no
new operations can be performed on it. Operations that were new operations can be performed on it. Operations that were
previously issued but for which no reply had been received may be previously sent but for which no reply had been received may be re-
reissued to determine whether they had been performed before the sent to determine whether they had been performed before the server
server reboot. The session in this situation is referred to as reboot. The session in this situation is referred to as "dead" and
"dead" and when an operation that has not been performed previously, when an operation that has not been performed previously, i.e. it is
i.e. it is not satisfied from the replay cache, the error not satisfied from the replay cache, the error NFS4ERR_DEADSESSION is
NFS4ERR_DEADSESSION is returned. In this situation, in order to returned. In this situation, in order to perform new operations, the
perform new operations, the client must establish a new session. If client must establish a new session. If an attempt is made to
an attempt is made to establish this new session with the existing establish this new session with the existing client ID, the server
client ID, the server will reject the request with will reject the request with NFS4ERR_STALE_CLIENTID.
NFS4ERR_STALE_CLIENTID.
When NFS4ERR_STALE_CLIENTID is received in either of these When NFS4ERR_STALE_CLIENTID is received in either of these
situations, the client must obtain a new client ID by use of the situations, the client must obtain a new client ID by use of the
EXCHANGE_ID operation, then use that client ID as the basis of a new EXCHANGE_ID operation, then use that client ID as the basis of a new
session, and then proceed to any other necessary recovery for the session, and then proceed to any other necessary recovery for the
server restart case (See Section 8.4.2). server restart case (See Section 8.4.2).
See the detailed descriptions of EXCHANGE_ID (Section 18.35 and See the detailed descriptions of EXCHANGE_ID (Section 18.35 and
CREATE_SESSION (Section 18.36) for a complete specification of these CREATE_SESSION (Section 18.36) for a complete specification of these
operations. operations.
skipping to change at page 26, line 6 skipping to change at page 27, line 4
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 2.4.1. Upgrade from NFSv4.0 to NFSv4.1
To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a
client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established
using SETCLIENTID using NFSv4.0, so that an NFSv4.1 client is not using SETCLIENTID using NFSv4.0, so that an NFSv4.1 client is not
forced to delay until lease expiration for locking state established forced to delay until lease expiration for locking state established
by the earlier client using minor version 0. This requires the by the earlier client using minor version 0. This requires the
client_owner4 be constructed the same way as the nfs_client_id4. If client_owner4 be constructed the same way as the nfs_client_id4. If
the latter's contents included the server's network address, and the the latter's contents included the server's network address, and the
NFSv4.1 client does not wish to use a client ID that prevents NFSv4.1 client does not wish to use a client ID that prevents
trunking, it should issue two EXCHANGE_ID operations. The first trunking, it should send two EXCHANGE_ID operations. The first
EXCHANGE_ID will have a client_owner4 equal to the nfs_client_id4. EXCHANGE_ID will have a client_owner4 equal to the nfs_client_id4.
This will clear the state created by the NFSv4.0 client. The second This will clear the state created by the NFSv4.0 client. The second
EXCHANGE_ID will not have the server's network address. The state EXCHANGE_ID will not have the server's network address. The state
created for the second EXCHANGE_ID will not have to wait for lease created for the second EXCHANGE_ID will not have to wait for lease
expiration, because there will be no state to expire. expiration, because there will be no state to expire.
2.4.2. Server Release of Client ID 2.4.2. Server Release of Client ID
NFSv4.1 introduces a new operation called DESTROY_CLIENTID NFSv4.1 introduces a new operation called DESTROY_CLIENTID
(Section 18.50) which the client SHOULD use to destroy a client ID it (Section 18.50) which the client SHOULD use to destroy a client ID it
skipping to change at page 26, line 35 skipping to change at page 27, line 33
client so that resources are not consumed by those intermittently client so that resources are not consumed by those intermittently
active clients. If the client contacts the server after this active clients. If the client contacts the server after this
release, the server must ensure the client receives the appropriate release, the server must ensure the client receives the appropriate
error so that it will use the EXCHANGE_ID/CREATE_SESSION sequence to error so that it will use the EXCHANGE_ID/CREATE_SESSION sequence to
establish a new identity. It should be clear that the server must be establish a new identity. It should be clear that the server must be
very hesitant to release a client ID since the resulting work on the very hesitant to release a client ID since the resulting work on the
client to recover from such an event will be the same burden as if client to recover from such an event will be the same burden as if
the server had failed and restarted. Typically a server would not the server had failed and restarted. Typically a server would not
release a client ID unless there had been no activity from that release a client ID unless there had been no activity from that
client for many minutes. As long as there are sessions, opens, client for many minutes. As long as there are sessions, opens,
locks, delegations, layouts, or wants, the server MUST not release locks, delegations, layouts, or wants, the server MUST NOT release
the client ID. See Section 2.10.10.1.4 for discussion on releasing the client ID. See Section 2.10.10.1.4 for discussion on releasing
inactive sessions. inactive sessions.
2.4.3. Resolving Client Owner Conflicts 2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently When the server gets an EXCHANGE_ID for a client owner that currently
has no state, or if it has state, but the lease has expired, the has no state, or if it has state, but the lease has expired, the
server MUST allow the EXCHANGE_ID, and confirm the new client ID if server MUST allow the EXCHANGE_ID, and confirm the new client ID if
followed by the appropriate CREATE_SESSION. followed by the appropriate CREATE_SESSION.
skipping to change at page 27, line 20 skipping to change at page 28, line 17
privacy, and the same GSS mechanism and principal must be used as privacy, and the same GSS mechanism and principal must be used as
that used when the client ID was created. that used when the client ID was created.
o The client ID was established with SP4_SSV protection o The client ID was established with SP4_SSV protection
(Section 18.35, Section 2.10.7.3) and the client sends the (Section 18.35, Section 2.10.7.3) and the client sends the
EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
GSS SSV mechanism (Section 2.10.8). GSS SSV mechanism (Section 2.10.8).
o The client ID was established with SP4_SSV protection. Because o The client ID was established with SP4_SSV protection. Because
the SSV might not be persisted across client and server restart, the SSV might not be persisted across client and server restart,
and because the first time a client issues EXCHANGE_ID to a server and because the first time a client sends EXCHANGE_ID to a server
it does not have an SSV, the client MAY issue the subsequent it does not have an SSV, the client MAY send the subsequent
EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with
SP4_MACH_CRED protection, the principal MUST be based on SP4_MACH_CRED protection, the principal MUST be based on
RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be
integrity or privacy, and the same GSS mechanism and principal integrity or privacy, and the same GSS mechanism and principal
must be used as that used when the client ID was created. must be used as that used when the client ID was created.
If none of the above situations apply, the server MUST return If none of the above situations apply, the server MUST return
NFS4ERR_CLID_INUSE. NFS4ERR_CLID_INUSE.
If the server accepts the principal and co_ownerid as matching that If the server accepts the principal and co_ownerid as matching that
skipping to change at page 28, line 6 skipping to change at page 29, line 4
Owner is defined in the following structure: Owner is defined in the following structure:
struct server_owner4 { struct server_owner4 {
uint64_t so_minor_id; uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>; opaque so_major_id<NFS4_OPAQUE_LIMIT>;
}; };
The Server Owner is returned from EXCHANGE_ID. When the so_major_id The Server Owner is returned from EXCHANGE_ID. When the so_major_id
fields are the same in two EXCHANGE_ID results, the connections each fields are the same in two EXCHANGE_ID results, the connections each
EXCHANGE_ID are sent over can be assumed to address the same Server EXCHANGE_ID are sent over can be assumed to address the same Server
(as defined in Section 1.5). If the so_minor_id fields are also the (as defined in Section 1.6). If the so_minor_id fields are also the
same, then not only do both connections connect to the same server, same, then not only do both connections connect to the same server,
but the session and other state can be shared across both but the session and other state can be shared across both
connections. The reader is cautioned that multiple servers may connections. The reader is cautioned that multiple servers may
deliberately or accidentally claim to have the same so_major_id or deliberately or accidentally claim to have the same so_major_id or
so_major_id/so_minor_id; the reader should examine Section 2.10.4 and so_major_id/so_minor_id; the reader should examine Section 2.10.4 and
Section 18.35. Section 18.35.
The considerations for generating a so_major_id are similar to that The considerations for generating a so_major_id are similar to that
for generating a co_ownerid string (see Section 2.4). The for generating a co_ownerid string (see Section 2.4). The
consequences of two servers generating conflicting so_major_id values consequences of two servers generating conflicting so_major_id values
are less dire than they are for co_ownerid conflicts because the are less dire than they are for co_ownerid conflicts because the
client can use RPCSEC_GSS to compare the authenticity of each server client can use RPCSEC_GSS to compare the authenticity of each server
(see Section 2.10.4). (see Section 2.10.4).
2.6. Security Service Negotiation 2.6. Security Service Negotiation
With the NFS version 4 server potentially offering multiple security With the NFSv4.1 server potentially offering multiple security
mechanisms, the client needs a method to determine or negotiate which mechanisms, the client needs a method to determine or negotiate which
mechanism is to be used for its communication with the server. The mechanism is to be used for its communication with the server. The
NFS server may have multiple points within its file system namespace NFS server may have multiple points within its file system namespace
that are available for use by NFS clients. These points can be that are available for use by NFS clients. These points can be
considered security policy boundaries, and in some NFS considered security policy boundaries, and in some NFS
implementations are tied to NFS export points. In turn the NFS implementations are tied to NFS export points. In turn the NFS
server may be configured such that each of these security policy server may be configured such that each of these security policy
boundaries may have different or multiple security mechanisms in use. boundaries may have different or multiple security mechanisms in use.
The security negotiation between client and server must be done with The security negotiation between client and server must be done with
skipping to change at page 29, line 6 skipping to change at page 30, line 4
flavor is RPCSEC_GSS, a GSS-API mechanism OID, a GSS-API quality of flavor is RPCSEC_GSS, a GSS-API mechanism OID, a GSS-API quality of
protection, and an RPCSEC_GSS service. protection, and an RPCSEC_GSS service.
2.6.2. SECINFO and SECINFO_NO_NAME 2.6.2. SECINFO and SECINFO_NO_NAME
The SECINFO and SECINFO_NO_NAME operations allow the client to The SECINFO and SECINFO_NO_NAME operations allow the client to
determine, on a per filehandle basis, what security tuple is to be determine, on a per filehandle basis, what security tuple is to be
used for server access. In general, the client will not have to use used for server access. In general, the client will not have to use
either operation except during initial communication with the server either operation except during initial communication with the server
or when the client crosses security policy boundaries at the server. or when the client crosses security policy boundaries at the server.
However, the server's policies may also change at any time and force However, the server's policies may also change at any time and force
the client to negotiate a new security tuple. the client to negotiate a new security tuple.
Where the use of different security tuples would affect the type of Where the use of different security tuples would affect the type of
access that would be allowed if a request was issued over the same access that would be allowed if a request was sent over the same
connection used for the SECINFO or SECINFO_NO_NAME operation (e.g. connection used for the SECINFO or SECINFO_NO_NAME operation (e.g.
read-only vs. read-write) access, security tuples that allow greater read-only vs. read-write) access, security tuples that allow greater
access should be presented first. Where the general level of access access should be presented first. Where the general level of access
is the same and different security flavors limit the range of is the same and different security flavors limit the range of
principals whose privileges are recognized (e.g. allowing or principals whose privileges are recognized (e.g. allowing or
disallowing root access), flavors supporting the greatest range of disallowing root access), flavors supporting the greatest range of
principals should be listed first. principals should be listed first.
2.6.3. Security Error 2.6.3. Security Error
Based on the assumption that each NFS version 4 client and server Based on the assumption that each NFSv4.1 client and server must
must support a minimum set of security (i.e., LIPKEY, SPKM-3, and support a minimum set of security (i.e., LIPKEY, SPKM-3, and
Kerberos-V5 all under RPCSEC_GSS), the NFS client will initiate file Kerberos-V5 all under RPCSEC_GSS), the NFS client will initiate file
access to the server with one of the minimal security tuples. During access to the server with one of the minimal security tuples. During
communication with the server, the client may receive an NFS error of communication with the server, the client may receive an NFS error of
NFS4ERR_WRONGSEC. This error allows the server to notify the client NFS4ERR_WRONGSEC. This error allows the server to notify the client
that the security tuple currently being used contravenes the server's that the security tuple currently being used contravenes the server's
security policy. The client is then responsible for determining (see security policy. The client is then responsible for determining (see
Section 2.6.3.1) what security tuples are available at the server and Section 2.6.3.1) what security tuples are available at the server and
choosing one which is appropriate for the client. choosing one which is appropriate for the client.
2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME 2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME
skipping to change at page 31, line 13 skipping to change at page 32, line 9
time the filehandle was obtained. time the filehandle was obtained.
Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in
response to the put filehandle operation if the operation is response to the put filehandle operation if the operation is
immediately followed by a LOOKUP or an OPEN by component name. immediately followed by a LOOKUP or an OPEN by component name.
2.6.3.1.4. Put Filehandle Operation + LOOKUPP 2.6.3.1.4. Put Filehandle Operation + LOOKUPP
Since SECINFO only works its way down, there is no way LOOKUPP can Since SECINFO only works its way down, there is no way LOOKUPP can
return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME
solves this issue because via style SECINFO_STYLE4_PARENT, it works solves this send because via style SECINFO_STYLE4_PARENT, it works in
in the opposite direction as SECINFO. As with Section 2.6.3.1.3, the the opposite direction as SECINFO. As with Section 2.6.3.1.3, the
put filehandle operation must not return NFS4ERR_WRONGSEC whenever it put filehandle operation must not return NFS4ERR_WRONGSEC whenever it
is followed by LOOKUPP. If the server does not support is followed by LOOKUPP. If the server does not support
SECINFO_NO_NAME, the client's only recourse is to issue the put SECINFO_NO_NAME, the client's only recourse is to send the put
filehandle operation, LOOKUPP, GETFH sequence of operations with filehandle operation, LOOKUPP, GETFH sequence of operations with
every security tuple it supports. every security tuple it supports.
Regardless whether SECINFO_NO_NAME is supported, an NFSv4.1 server Regardless whether SECINFO_NO_NAME is supported, an NFSv4.1 server
MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle
operation if the operation is immediately followed by a LOOKUPP. operation if the operation is immediately followed by a LOOKUPP.
2.6.3.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME 2.6.3.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME
A security sensitive client is allowed to choose a strong security A security sensitive client is allowed to choose a strong security
skipping to change at page 32, line 19 skipping to change at page 33, line 15
The security policy enforcement applies to the filehandle specified The security policy enforcement applies to the filehandle specified
in the put filehandle operation. Therefore the put filehandle in the put filehandle operation. Therefore the put filehandle
operation must return NFS4ERR_WRONGSEC when there is a security tuple operation must return NFS4ERR_WRONGSEC when there is a security tuple
mismatch. This avoids the complexity adding NFS4ERR_WRONGSEC as an mismatch. This avoids the complexity adding NFS4ERR_WRONGSEC as an
allowable error to every other operation. allowable error to every other operation.
A COMPOUND containing the series put filehandle operation + A COMPOUND containing the series put filehandle operation +
SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way
for the client to recover from NFS4ERR_WRONGSEC. for the client to recover from NFS4ERR_WRONGSEC.
The NFSv4.1 server MUST not return NFS4ERR_WRONGSEC to any operation The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation
other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by
component name). component name).
2.6.3.1.8. Operations after SECINFO and SECINFO_NO_NAME 2.6.3.1.8. Operations after SECINFO and SECINFO_NO_NAME
Placing an operation that uses the current filehandle after SECINFO Placing an operation that uses the current filehandle after SECINFO
or SECINFO_NO_NAME seemingly introduces a issue with what error to or SECINFO_NO_NAME seemingly introduces a issue with what error to
return when security tuple of the request is not allowed for the return when security tuple of the request is not allowed for the
operation that uses the current filehandle. For example, suppose a operation that uses the current filehandle. For example, suppose a
client sends a COMPOUND procedure containing this series of client sends a COMPOUND procedure containing this series of
skipping to change at page 32, line 42 skipping to change at page 33, line 38
By rule (see Section 2.6.3.1.5), neither PUTFH nor SECINFO_NO_NAME By rule (see Section 2.6.3.1.5), neither PUTFH nor SECINFO_NO_NAME
can return NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.7), READ can return NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.7), READ
cannot return NFS4ERR_WRONGSEC. The issue is resolved by the fact cannot return NFS4ERR_WRONGSEC. The issue is resolved by the fact
that SECINFO and SECINFO_NO_NAME consume the current filehandle. that SECINFO and SECINFO_NO_NAME consume the current filehandle.
This leaves no current filehandle for READ to use, and READ returns This leaves no current filehandle for READ to use, and READ returns
NFS4ERR_NOFILEHANDLE. NFS4ERR_NOFILEHANDLE.
2.7. Minor Versioning 2.7. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the To address the requirement of an NFS protocol that can evolve as the
need arises, the NFS version 4 protocol contains the rules and need arises, the NFSv4.1 protocol contains the rules and framework to
framework to allow for future minor changes or versioning. allow for future minor changes or versioning.
The base assumption with respect to minor versioning is that any The base assumption with respect to minor versioning is that any
future accepted minor version must follow the IETF process and be future accepted minor version must follow the IETF process and be
documented in a standards track RFC. Therefore, each minor version documented in a standards track RFC. Therefore, each minor version
number will correspond to an RFC. Minor version zero of the NFS number will correspond to an RFC. Minor version zero of the NFSv4
version 4 protocol is represented by [20], and minor version one is protocol is represented by [20], and minor version one is represented
represented by this document [[Comment.1: change "document" to "RFC" by this document [[Comment.1: RFC Editor: change "document" to "RFC"
when we publish]] . The COMPOUND and CB_COMPOUND procedures support when we publish]] . The COMPOUND and CB_COMPOUND procedures support
the encoding of the minor version being requested by the client. the encoding of the minor version being requested by the client.
The following items represent the basic rules for the development of The following items represent the basic rules for the development of
minor versions. Note that a future minor version may decide to minor versions. Note that a future minor version may decide to
modify or add to the following rules as part of the minor version modify or add to the following rules as part of the minor version
definition. definition.
1. Procedures are not added or deleted 1. Procedures are not added or deleted
To maintain the general RPC model, NFS version 4 minor versions To maintain the general RPC model, NFSv4 minor versions will not
will not add to or delete procedures from the NFS program. add to or delete procedures from the NFS program.
2. Minor versions may add operations to the COMPOUND and 2. Minor versions may add operations to the COMPOUND and
CB_COMPOUND procedures. CB_COMPOUND procedures.
The addition of operations to the COMPOUND and CB_COMPOUND The addition of operations to the COMPOUND and CB_COMPOUND
procedures does not affect the RPC model. procedures does not affect the RPC model.
* Minor versions may append attributes to the bitmap4 that * Minor versions may append attributes to the bitmap4 that
represents sets of attributes and the fattr4 that represents represents sets of attributes and the fattr4 that represents
sets of attribute values. sets of attribute values.
skipping to change at page 36, line 14 skipping to change at page 37, line 7
2.9. Transport Layers 2.9. Transport Layers
2.9.1. Required and Recommended Properties of Transports 2.9.1. Required and Recommended Properties of Transports
NFSv4.1 works over RDMA and non-RDMA_based transports with the NFSv4.1 works over RDMA and non-RDMA_based transports with the
following attributes: following attributes:
o The transport supports reliable delivery of data, which NFSv4.1 o The transport supports reliable delivery of data, which NFSv4.1
requires but neither NFSv4.1 nor RPC has facilities for ensuring. requires but neither NFSv4.1 nor RPC has facilities for ensuring.
[24] [23]
o The transport delivers data in the order it was sent. Ordered o The transport delivers data in the order it was sent. Ordered
delivery simplifies detection of transmit errors, and simplifies delivery simplifies detection of transmit errors, and simplifies
the sending of arbitrary sized requests and responses, via the the sending of arbitrary sized requests and responses, via the
record marking protocol [3]. record marking protocol [3].
Where an NFS version 4 implementation supports operation over the IP Where an NFSv4.1 implementation supports operation over the IP
network protocol, any transport used between NFS and IP MUST be among network protocol, any transport used between NFS and IP MUST be among
the IETF-approved congestion control transport protocols. At the the IETF-approved congestion control transport protocols. At the
time this document was written, the only two transports that had the time this document was written, the only two transports that had the
above attributes were TCP and SCTP. To enhance the possibilities for above attributes were TCP and SCTP. To enhance the possibilities for
interoperability, an NFS version 4 implementation MUST support interoperability, an NFSv4.1 implementation MUST support operation
operation over the TCP transport protocol. over the TCP transport protocol.
Even if NFS version 4 is used over a non-IP network protocol, it is Even if NFSv4.1 is used over a non-IP network protocol, it is
RECOMMENDED that the transport support congestion control. RECOMMENDED that the transport support congestion control.
It is permissible for a connectionless transport to be used under It is permissible for a connectionless transport to be used under
NFSv4.1, however reliable and in-order delivery of data by the NFSv4.1, however reliable and in-order delivery of data by the
connectionless transport are still required. NFSv4.1 assumes that a connectionless transport are still required. NFSv4.1 assumes that a
client transport address and server transport address used to send client transport address and server transport address used to send
data over a transport together constitute a connection, even if the data over a transport together constitute a connection, even if the
underlying transport eschews the concept of a connection. underlying transport eschews the concept of a connection.
2.9.2. Client and Server Transport Behavior 2.9.2. Client and Server Transport Behavior
skipping to change at page 37, line 9 skipping to change at page 37, line 51
eliminating the need for connection setup handshakes. eliminating the need for connection setup handshakes.
3. The NFSv4.1 callback model differs from NFSv4.0, and requires the 3. The NFSv4.1 callback model differs from NFSv4.0, and requires the
client and server to maintain a client-created backchannel (see client and server to maintain a client-created backchannel (see
Section 2.10.3.1) for the server to use. Section 2.10.3.1) for the server to use.
In order to reduce congestion, if a connection-oriented transport is In order to reduce congestion, if a connection-oriented transport is
used, and the request is not the NULL procedure, used, and the request is not the NULL procedure,
o A requester MUST NOT retry a request unless the connection the o A requester MUST NOT retry a request unless the connection the
request was issued over was lost before the reply was received. request was sent over was lost before the reply was received.
o A replier MUST NOT silently drop a request, even if the request is o A replier MUST NOT silently drop a request, even if the request is
a retry. (The silent drop behavior of RPCSEC_GSS [4] does not a retry. (The silent drop behavior of RPCSEC_GSS [4] does not
apply because this behavior happens at the RPCSEC_GSS layer, a apply because this behavior happens at the RPCSEC_GSS layer, a
lower layer in the request processing). Instead, the replier lower layer in the request processing). Instead, the replier
SHOULD return an appropriate error (see Section 2.10.5.1) or it SHOULD return an appropriate error (see Section 2.10.5.1) or it
MAY disconnect the connection. MAY disconnect the connection.
When sending a reply, the replier MUST send the reply to the same When sending a reply, the replier MUST send the reply to the same
full network address (e.g. if using an IP-based transport, the source full network address (e.g. if using an IP-based transport, the source
port of the requester is part of the full network address) that the port of the requester is part of the full network address) that the
requester issued the request from. If using a connection-oriented requester sent the request from. If using a connection-oriented
transport, replies MUST be sent on the same connection the request transport, replies MUST be sent on the same connection the request
was received from. was received from.
If a connection is dropped after the replier receives the request but If a connection is dropped after the replier receives the request but
before the replier sends the reply, the replier might have an pending before the replier sends the reply, the replier might have an pending
reply. If a connection is established with the same source and reply. If a connection is established with the same source and
destination full network address as the dropped connection, then the destination full network address as the dropped connection, then the
replier MUST NOT send the reply until the client retries the request. replier MUST NOT send the reply until the client retries the request.
The reason for this prohibition is that the client MAY retry a The reason for this prohibition is that the client MAY retry a
request over a different connection than is associated with the request over a different connection than is associated with the
skipping to change at page 38, line 4 skipping to change at page 38, line 47
request. If the requester retransmitted a request, the additional request. If the requester retransmitted a request, the additional
credit consumed on the server might lead to RDMA connection credit consumed on the server might lead to RDMA connection
failure unless the client accounted for it and decreased its failure unless the client accounted for it and decreased its
available credit, leading to wasted resources. available credit, leading to wasted resources.
o RDMA credits present a new issue to the reply cache in NFSv4.1. o RDMA credits present a new issue to the reply cache in NFSv4.1.
The reply cache may be used when a connection within a session is The reply cache may be used when a connection within a session is
lost, such as after the client reconnects. Credit information is lost, such as after the client reconnects. Credit information is
a dynamic property of the RDMA connection, and stale values must a dynamic property of the RDMA connection, and stale values must
not be replayed from the cache. This implies that the reply cache not be replayed from the cache. This implies that the reply cache
contents must not be blindly used when replies are issued from it, contents must not be blindly used when replies are sent from it,
and credit information appropriate to the channel must be and credit information appropriate to the channel must be
refreshed by the RPC layer. refreshed by the RPC layer.
In addition, the NFSv4.1 requester is not allowed to stop waiting for In addition, the NFSv4.1 requester is not allowed to stop waiting for
a reply, as described in Section 2.10.5.2. a reply, as described in Section 2.10.5.2.
2.9.3. Ports 2.9.3. Ports
Historically, NFS version 2 and version 3 servers have listened over Historically, NFSv3 servers have listened over TCP port 2049. The
TCP port 2049. The registered port 2049 [25] for the NFS protocol registered port 2049 [24] for the NFS protocol should be the default
should be the default configuration. NFSv4.1 clients SHOULD NOT use configuration. NFSv4.1 clients SHOULD NOT use the RPC binding
the RPC binding protocols as described in [26]. protocols as described in [25].
2.10. Session 2.10. Session
2.10.1. Motivation and Overview 2.10.1. Motivation and Overview
Previous versions and minor versions of NFS have suffered from the Previous versions and minor versions of NFS have suffered from the
following: following:
o Lack of support for exactly once semantics (EOS). This includes o Lack of support for exactly once semantics (EOS). This includes
lack of support for EOS through server failure and recovery. lack of support for EOS through server failure and recovery.
skipping to change at page 38, line 47 skipping to change at page 39, line 41
shortfalls with practical solutions: shortfalls with practical solutions:
o EOS is enabled by a reply cache with a bounded size, making it o EOS is enabled by a reply cache with a bounded size, making it
feasible to keep the cache in persistent storage and enable EOS feasible to keep the cache in persistent storage and enable EOS
through server failure and recovery. One reason that previous through server failure and recovery. One reason that previous
revisions of NFS did not support EOS was because some EOS revisions of NFS did not support EOS was because some EOS
approaches often limited parallelism. As will be explained in approaches often limited parallelism. As will be explained in
Section 2.10.5, NFSv4.1 supports both EOS and unlimited Section 2.10.5, NFSv4.1 supports both EOS and unlimited
parallelism. parallelism.
o The NFSv4.1 client (defined in Section 1.5, Paragraph 2) creates o The NFSv4.1 client (defined in Section 1.6, Paragraph 2) creates
transport connections and provides them to the server to use for transport connections and provides them to the server to use for
sending callback requests, thus solving the firewall issue sending callback requests, thus solving the firewall issue
(Section 18.34). Races between responses from client requests, (Section 18.34). Races between responses from client requests,
and callbacks caused by the requests are detected via the and callbacks caused by the requests are detected via the
session's sequencing properties which are a consequence of EOS session's sequencing properties which are a consequence of EOS
(Section 2.10.5.3). (Section 2.10.5.3).
o The NFSv4.1 client can add an arbitrary number of connections to o The NFSv4.1 client can add an arbitrary number of connections to
the session, and thus provide trunking (Section 2.10.4). the session, and thus provide trunking (Section 2.10.4).
skipping to change at page 40, line 48 skipping to change at page 41, line 42
2.10.2.2. Client ID and Session Association 2.10.2.2. Client ID and Session Association
Each client ID (Section 2.4) can have zero or more active sessions. Each client ID (Section 2.4) can have zero or more active sessions.
A client ID and associated session are required to perform file A client ID and associated session are required to perform file
access in NFSv4.1. Each time a session is used (whether by a client access in NFSv4.1. Each time a session is used (whether by a client
sending a request to the server, or the client replying to a callback sending a request to the server, or the client replying to a callback
request from the server), the state leased to its associated client request from the server), the state leased to its associated client
ID is automatically renewed. ID is automatically renewed.
State such as share reservations, locks, delegations, and layouts State such as share reservations, locks, delegations, and layouts
(Section 1.4.4) is tied to the client ID. Client state is not tied (Section 1.5.4) is tied to the client ID. Client state is not tied
to any individual session. Successive state changing operations from to any individual session. Successive state changing operations from
a given state owner MAY go over different sessions, provided the a given state owner MAY go over different sessions, provided the
session is associated with the same client ID. A callback MAY arrive session is associated with the same client ID. A callback MAY arrive
over a different session than from the session that originally over a different session than from the session that originally
acquired the state pertaining to the callback. For example, if acquired the state pertaining to the callback. For example, if
session A is used to acquire a delegation, a request to recall the session A is used to acquire a delegation, a request to recall the
delegation MAY arrive over session B if both sessions are associated delegation MAY arrive over session B if both sessions are associated
with the same client ID. Section 2.10.7.1 and Section 2.10.7.2 with the same client ID. Section 2.10.7.1 and Section 2.10.7.2
discuss the security considerations around callbacks. discuss the security considerations around callbacks.
skipping to change at page 42, line 38 skipping to change at page 43, line 30
server in order to increase the speed of data transfer. NFSv4.1 server in order to increase the speed of data transfer. NFSv4.1
supports two types of trunking: session trunking and client ID supports two types of trunking: session trunking and client ID
trunking. NFSv4.1 servers MUST support trunking. trunking. NFSv4.1 servers MUST support trunking.
Session trunking is essentially the association of multiple Session trunking is essentially the association of multiple
connections, each with a potentially different target network connections, each with a potentially different target network
address, to the same session. address, to the same session.
Client ID trunking is the association of multiple sessions to the Client ID trunking is the association of multiple sessions to the
same client ID, major server owner ID (Section 2.5), and server scope same client ID, major server owner ID (Section 2.5), and server scope
(Section 11.6.7). When two servers return the same major server (Section 11.7.7). When two servers return the same major server
owner and server scope it means the two servers are cooperating on owner and server scope it means the two servers are cooperating on
locking state management which is a prerequisite for client ID locking state management which is a prerequisite for client ID
trunking. trunking.
Understanding and distinguishing session and client ID trunking Understanding and distinguishing session and client ID trunking
requires understanding how the results of the EXCHANGE_ID requires understanding how the results of the EXCHANGE_ID
(Section 18.35) operation identify a server. Suppose a client issues (Section 18.35) operation identify a server. Suppose a client sends
EXCHANGE_ID over two different connections each with a possibly EXCHANGE_ID over two different connections each with a possibly
different target network address but each EXCHANGE_ID with the same different target network address but each EXCHANGE_ID with the same
value in the eia_clientowner field. If the same NFSv4.1 server is value in the eia_clientowner field. If the same NFSv4.1 server is
listening over each connection, then each EXCHANGE_ID result MUST listening over each connection, then each EXCHANGE_ID result MUST
return the same values of eir_clientid, eir_server_owner.so_major_id return the same values of eir_clientid, eir_server_owner.so_major_id
and eir_server_scope. The client can then treat each connection as and eir_server_scope. The client can then treat each connection as
referring to the same server (subject to verification, see referring to the same server (subject to verification, see
Paragraph 5 later in this section), and it can use each connection to Paragraph 5 later in this section), and it can use each connection to
trunk requests and replies. The question is whether session trunking trunk requests and replies. The question is whether session trunking
and/or client ID trunking applies. and/or client ID trunking applies.
skipping to change at page 43, line 18 skipping to change at page 44, line 15
Session Trunking If the eia_clientowner argument is the same in two Session Trunking If the eia_clientowner argument is the same in two
different EXCHANGE_ID requests, and the eir_clientid, different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and
eir_server_scope results match in both EXCHANGE_ID results, then eir_server_scope results match in both EXCHANGE_ID results, then
the client is permitted to perform session trunking. If the the client is permitted to perform session trunking. If the
client has no session mapping to the tuple of eir_clientid, client has no session mapping to the tuple of eir_clientid,
eir_server_owner.so_major_id, eir_server_scope, eir_server_owner.so_major_id, eir_server_scope,
eir_server_owner.so_minor_id, then it creates the session via a eir_server_owner.so_minor_id, then it creates the session via a
CREATE_SESSION operation over one of the connections, which CREATE_SESSION operation over one of the connections, which
associates the connection to the session. If there is a session associates the connection to the session. If there is a session
for the tuple, the client can issue BIND_CONN_TO_SESSION to for the tuple, the client can send BIND_CONN_TO_SESSION to
associate the connection to the session. Or if the client does associate the connection to the session. Or if the client does
not want to use session trunking, it can invoke CREATE_SESSION on not want to use session trunking, it can invoke CREATE_SESSION on
the connection. the connection.
Client ID Trunking If the eia_clientowner argument is the same in Client ID Trunking If the eia_clientowner argument is the same in
two different EXCHANGE_ID requests, and the eir_clientid, two different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope results match eir_server_owner.so_major_id, and eir_server_scope results match
in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id
results do not match then the client is permitted to perform results do not match then the client is permitted to perform
client ID trunking. The client can associate each connection with client ID trunking. The client can associate each connection with
skipping to change at page 44, line 16 skipping to change at page 45, line 12
a client ID is created, the client SHOULD specify that a client ID is created, the client SHOULD specify that
BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or
SP4_MACH_CRED (Section 18.35) state protection options. For SP4_MACH_CRED (Section 18.35) state protection options. For
SP4_SSV, reliable verification depends on a shared secret (the SP4_SSV, reliable verification depends on a shared secret (the
SSV) that is established via the SET_SSV (Section 18.47) SSV) that is established via the SET_SSV (Section 18.47)
operation. operation.
When a new connection is associated with the session (via the When a new connection is associated with the session (via the
BIND_CONN_TO_SESSION operation, see Section 18.34), if the client BIND_CONN_TO_SESSION operation, see Section 18.34), if the client
specified SP4_SSV state protection for the BIND_CONN_TO_SESSION specified SP4_SSV state protection for the BIND_CONN_TO_SESSION
operation, the client MUST issue the BIND_CONN_TO_SESSION with operation, the client MUST send the BIND_CONN_TO_SESSION with
RPCSEC_GSS protection, using integrity or privacy, and a RPCSEC_GSS protection, using integrity or privacy, and a
RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.8). The RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.8). The
RPCSEC_GSS handle is created by CREATE_SESSION (Section 18.36). RPCSEC_GSS handle is created by CREATE_SESSION (Section 18.36).
If the client mistakenly tries to associate a connection to a If the client mistakenly tries to associate a connection to a
session of a wrong server, the server will either reject the session of a wrong server, the server will either reject the
attempt because it is not aware of the session identifier of the attempt because it is not aware of the session identifier of the
BIND_CONN_TO_SESSION arguments, or it will reject the attempt BIND_CONN_TO_SESSION arguments, or it will reject the attempt
because the RPCSEC_GSS authentication fails. Even if the server because the RPCSEC_GSS authentication fails. Even if the server
mistakenly or maliciously accepts the connection association mistakenly or maliciously accepts the connection association
skipping to change at page 44, line 42 skipping to change at page 45, line 38
BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or
privacy, using the same credential that was used when the client privacy, using the same credential that was used when the client
ID was created. Mutual authentication via RPCSEC_GSS assures the ID was created. Mutual authentication via RPCSEC_GSS assures the
client that the connection is associated with the correct session client that the connection is associated with the correct session
of the correct server. of the correct server.
o For client ID trunking, the client has at least two options for o For client ID trunking, the client has at least two options for
verifying that the same client ID obtained from two different verifying that the same client ID obtained from two different
EXCHANGE_ID operations came from the same server. The first EXCHANGE_ID operations came from the same server. The first
option is to use RPCSEC_GSS authentication when issuing each option is to use RPCSEC_GSS authentication when issuing each
EXCHANGE_ID. Each time an EXCHANGE_ID is issued with RPCSEC_GSS EXCHANGE_ID. Each time an EXCHANGE_ID is sent with RPCSEC_GSS
authentication, the client notes the principal name of the GSS authentication, the client notes the principal name of the GSS
target. If the EXCHANGE_ID results indicate client ID trunking is target. If the EXCHANGE_ID results indicate client ID trunking is
possible, and the GSS targets' principal names are the same, the possible, and the GSS targets' principal names are the same, the
servers are the same and client ID trunking is allowed. servers are the same and client ID trunking is allowed.
The second option for verification is to use SP4_SSV protection. The second option for verification is to use SP4_SSV protection.
When the client issues EXCHANGE_ID it specifies SP4_SSV When the client sends EXCHANGE_ID it specifies SP4_SSV protection.
protection. The first EXCHANGE_ID the client issues always has to The first EXCHANGE_ID the client sends always has to be confirmed
be confirmed by a CREATE_SESSION call. The client then issues by a CREATE_SESSION call. The client then sends SET_SSV. Later
SET_SSV. Later the client issues EXCHANGE_ID to a second the client sends EXCHANGE_ID to a second destination network
destination network address than the first EXCHANGE_ID was issued address than the first EXCHANGE_ID was sent with. The client
with. The client checks that each EXCHANGE_ID reply has the same checks that each EXCHANGE_ID reply has the same eir_clientid,
eir_clientid, eir_server_owner.so_major_id, and eir_server_scope. eir_server_owner.so_major_id, and eir_server_scope. If so, the
If so, the client verifies the claim by issuing a CREATE_SESSION client verifies the claim by issuing a CREATE_SESSION to the
to the second destination address, protected with RPCSEC_GSS second destination address, protected with RPCSEC_GSS integrity
integrity using an RPCSEC_GSS handle returned by the second using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If
EXCHANGE_ID. If the server accept the CREATE_SESSION request, and the server accept the CREATE_SESSION request, and if the client
if the client verifies the RPCSEC_GSS verifier and integrity verifies the RPCSEC_GSS verifier and integrity codes, then the
codes, then the client has proof the second server knows the SSV, client has proof the second server knows the SSV, and thus the two
and thus the two servers are the same for the purposes of client servers are the same for the purposes of client ID trunking.
ID trunking.
2.10.5. Exactly Once Semantics 2.10.5. Exactly Once Semantics
Via the session, NFSv4.1 offers exactly once semantics (EOS) for Via the session, NFSv4.1 offers exactly once semantics (EOS) for
requests sent over a channel. EOS is supported on both the fore and requests sent over a channel. EOS is supported on both the fore and
back channels. back channels.
Each COMPOUND or CB_COMPOUND request that is issued with a leading Each COMPOUND or CB_COMPOUND request that is sent with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement is regardless whether the request is exactly once. This requirement is regardless whether the request is
issued with reply caching specified (see Section 2.10.5.1.2). The sent with reply caching specified (see Section 2.10.5.1.2). The
requirement holds even if the requester is issuing the request over a requirement holds even if the requester is issuing the request over a
session created between a pNFS data client and pNFS data server. The session created between a pNFS data client and pNFS data server. The
rationale for this requirement is understood by categorizing requests rationale for this requirement is understood by categorizing requests
into three classifications: into three classifications:
o Nonidempotent requests. o Nonidempotent requests.
o Idempotent modifying requests. o Idempotent modifying requests.
o Idempotent non-modifying requests. o Idempotent non-modifying requests.
skipping to change at page 45, line 51 skipping to change at page 46, line 46
execution succeeds, the re-execution will fail. If the replier execution succeeds, the re-execution will fail. If the replier
returns the result from the re-execution, this result is incorrect. returns the result from the re-execution, this result is incorrect.
Therefore, EOS is required for nonidempotent requests. Therefore, EOS is required for nonidempotent requests.
An example of an idempotent modifying request is a COMPOUND request An example of an idempotent modifying request is a COMPOUND request
containing a WRITE operation. Repeated execution of the same WRITE containing a WRITE operation. Repeated execution of the same WRITE
has the same effect as execution of that write once. Nevertheless, has the same effect as execution of that write once. Nevertheless,
putting enforcing EOS for WRITEs and other idempotent modifying putting enforcing EOS for WRITEs and other idempotent modifying
requests is necessary to avoid data corruption. requests is necessary to avoid data corruption.
Suppose a client issues WRITEs A, and B to a noncompliant server that Suppose a client sends WRITEs A and B to a noncompliant server that
does not enforce EOS, and receives no response, perhaps due to a does not enforce EOS, and receives no response, perhaps due to a
network partition. The client reconnects to the server and re-issues network partition. The client reconnects to the server and re-sends
both WRITEs. Now, the server has outstanding two instances of each both WRITEs. Now, the server has outstanding two instances of each
of A and B. The server can be in a situation in which it executes and of A and B. The server can be in a situation in which it executes and
replies to the retries of A and B, while the first A and B are still replies to the retries of A and B, while the first A and B are still
waiting in the server's I/O system for some resource. Upon receiving waiting in the server's I/O system for some resource. Upon receiving
the replies to the second attempts of WRITEs A and B, the client the replies to the second attempts of WRITEs A and B, the client
believes its writes are done so it is free to issue WRITE D which believes its writes are done so it is free to send WRITE D which
overlaps the range of one or both of A and B. If A or B are overlaps the range of one or both of A and B. If A or B are
subsequently executed for the second time, then what has been written subsequently executed for the second time, then what has been written
by D can be overwritten and thus corrupted. by D can be overwritten and thus corrupted.
An example of an idempotent non-modifying request is a COMPOUND An example of an idempotent non-modifying request is a COMPOUND
containing SEQUENCE, PUTFH, READLINK and nothing else. The re- containing SEQUENCE, PUTFH, READLINK and nothing else. The re-
execution of a such a request will not cause data corruption, or execution of a such a request will not cause data corruption, or
produce an incorrect result. Nonetheless, to keep the implementation produce an incorrect result. Nonetheless, to keep the implementation
simple, the replier MUST enforce EOS for all requests whether simple, the replier MUST enforce EOS for all requests whether
idempotent and non-modifying or not. idempotent and non-modifying or not.
skipping to change at page 46, line 42 skipping to change at page 47, line 37
ONC RPC transaction identifier (XID). Section 2.10.5.1 explains the ONC RPC transaction identifier (XID). Section 2.10.5.1 explains the
shortcomings of the XID as a basis for a reply cache and describes shortcomings of the XID as a basis for a reply cache and describes
how NFSv4.1 sessions improve upon the XID. how NFSv4.1 sessions improve upon the XID.
2.10.5.1. Slot Identifiers and Reply Cache 2.10.5.1. Slot Identifiers and Reply Cache
The RPC layer provides a transaction ID (XID), which, while required The RPC layer provides a transaction ID (XID), which, while required
to be unique, is not convenient for tracking requests for two to be unique, is not convenient for tracking requests for two
reasons. First, the XID is only meaningful to the requester; it reasons. First, the XID is only meaningful to the requester; it
cannot be interpreted by the replier except to test for equality with cannot be interpreted by the replier except to test for equality with
previously issued requests. When consulting an RPC-based duplicate previously sent requests. When consulting an RPC-based duplicate
request cache, the opaqueness of the XID requires a computationally request cache, the opaqueness of the XID requires a computationally
expensive lookup (often via a hash that includes XID and source expensive lookup (often via a hash that includes XID and source
address). NFSv4.1 requests use a non-opaque slot id which is an address). NFSv4.1 requests use a non-opaque slot id which is an
index into a slot table, which is far more efficient. Second, index into a slot table, which is far more efficient. Second,
because RPC requests can be executed by the replier in any order, because RPC requests can be executed by the replier in any order,
there is no bound on the number of requests that may be outstanding there is no bound on the number of requests that may be outstanding
at any time. To achieve perfect EOS using ONC RPC would require at any time. To achieve perfect EOS using ONC RPC would require
storing all replies in the reply cache. XIDs are 32 bits; storing storing all replies in the reply cache. XIDs are 32 bits; storing
over four billion (2^32) replies in the reply cache is not practical. over four billion (2^32) replies in the reply cache is not practical.
In practice, previous versions of NFS have chosen to store a fixed In practice, previous versions of NFS have chosen to store a fixed
number of replies in the cache, and use a least recently used (LRU) number of replies in the cache, and use a least recently used (LRU)
approach to replacing cache entries with new entries when the cache approach to replacing cache entries with new entries when the cache
is full. In NFSv4.1, the number of outstanding requests is bounded is full. In NFSv4.1, the number of outstanding requests is bounded
by the size of the slot table, and a sequence id per slot is used to by the size of the slot table, and a sequence id per slot is used to
tell the replier when it is safe to delete a cached reply. tell the replier when it is safe to delete a cached reply.
In the NFSv4.1 reply cache, when the requester issues a new request, In the NFSv4.1 reply cache, when the requester sends a new request,
it selects a slot id in the range 0..N, where N is the replier's it selects a slot id in the range 0..N, where N is the replier's
current maximum slot id granted to the requester on the session over current maximum slot id granted to the requester on the session over
which the request is to be issued. The value of N starts out as which the request is to be sent. The value of N starts out as equal
equal to ca_maxrequests - 1 (Section 18.36), but can be adjusted by to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the
the response to SEQUENCE or CB_SEQUENCE as described later in this response to SEQUENCE or CB_SEQUENCE as described later in this
section. The slot id must be unused by any of the requests which the section. The slot id must be unused by any of the requests which the
requester has already active on the session. "Unused" here means the requester has already active on the session. "Unused" here means the
requester has no outstanding request for that slot id. requester has no outstanding request for that slot id.
A slot contains a sequence id and the cached reply corresponding to A slot contains a sequence id and the cached reply corresponding to
the request send with that sequence id. The sequence id is a 32 bit the request send with that sequence id. The sequence id is a 32 bit
unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 -
1). The first time a slot is used, the requester must specify a 1). The first time a slot is used, the requester must specify a
sequence id of one (1) (Section 18.36). Each time a slot is re-used, sequence id of one (1) (Section 18.36). Each time a slot is reused,
the request MUST specify a sequence id that is one greater than that the request MUST specify a sequence id that is one greater than that
of the previous request on the slot. If the previous sequence id was of the previous request on the slot. If the previous sequence id was
0xFFFFFFFF, then the next request for the slot MUST have the sequence 0xFFFFFFFF, then the next request for the slot MUST have the sequence
id set to zero (i.e. (2^32 - 1) + 1 mod 2^32). id set to zero (i.e. (2^32 - 1) + 1 mod 2^32).
The sequence id accompanies the slot id in each request. It is for The sequence id accompanies the slot id in each request. It is for
the critical check at the server: it used to efficiently determine the critical check at the server: it used to efficiently determine
whether a request using a certain slot id is a retransmit or a new, whether a request using a certain slot id is a retransmit or a new,
never-before-seen request. It is not feasible for the client to never-before-seen request. It is not feasible for the client to
assert that it is retransmitting to implement this, because for any assert that it is retransmitting to implement this, because for any
skipping to change at page 49, line 22 skipping to change at page 50, line 19
Givem that well formulated XIDs continue to be required, this begs Givem that well formulated XIDs continue to be required, this begs
the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, the question why SEQUENCE and CB_SEQUENCE replies have a sessionid,
slot id and sequence id? Having the sessionid in the reply means the slot id and sequence id? Having the sessionid in the reply means the
requester does not have to use the XID to lookup the sessionid, which requester does not have to use the XID to lookup the sessionid, which
would be necessary if the connection were associated with multiple would be necessary if the connection were associated with multiple
sessions. Having the slot id and sequence id in the reply means sessions. Having the slot id and sequence id in the reply means
requester does not have to use the XID to lookup the slot id and requester does not have to use the XID to lookup the slot id and
sequence id. Furhermore, since the XID is only 32 bits, it is too sequence id. Furhermore, since the XID is only 32 bits, it is too
small to guarantee the re-association of a reply with its request small to guarantee the re-association of a reply with its request
([27]); having sessionid, slot id, and sequence id in the reply ([26]); having sessionid, slot id, and sequence id in the reply
allows the client to validate that the reply in fact belongs to the allows the client to validate that the reply in fact belongs to the
matched request. matched request.
The SEQUENCE (and CB_SEQUENCE) operation also carries a The SEQUENCE (and CB_SEQUENCE) operation also carries a
"highest_slotid" value which carries additional requester slot usage "highest_slotid" value which carries additional requester slot usage
information. The requester must always provide a slot id information. The requester must always provide a slot id
representing the outstanding request with the highest-numbered slot representing the outstanding request with the highest-numbered slot
value. The requester should in all cases provide the most value. The requester should in all cases provide the most
conservative value possible, although it can be increased somewhat conservative value possible, although it can be increased somewhat
above the actual instantaneous usage to maintain some minimum or above the actual instantaneous usage to maintain some minimum or
skipping to change at page 51, line 19 skipping to change at page 52, line 13
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.5.1.2. Optional Reply Caching 2.10.5.1.2. Optional Reply Caching
On a per-request basis the requester can choose to direct the replier On a per-request basis the requester can choose to direct the replier
to cache the reply to all operations after the first operation to cache the reply to all operations after the first operation
(SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis
fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it
would not direct the replier to cache the entire reply is that the would not direct the replier to cache the entire reply is that the
request is composed of all idempotent operations [24]. Caching the request is composed of all idempotent operations [23]. Caching the
reply may offer little benefit. If the reply is too large (see reply may offer little benefit. If the reply is too large (see
Section 2.10.5.4), it may not be cacheable anyway. Even if the reply Section 2.10.5.4), it may not be cacheable anyway. Even if the reply
to idempotent request is small enough to cache, unnecessarily caching to idempotent request is small enough to cache, unnecessarily caching
the reply slows down the server and increases RPC latency. the reply slows down the server and increases RPC latency.
Whether the requester requests the reply to be cached or not has no Whether the requester requests the reply to be cached or not has no
effect on the slot processing. If the results of SEQUENCE or effect on the slot processing. If the results of SEQUENCE or
CB_SEQUENCE are NFS4_OK, then the slot's sequence id MUST be CB_SEQUENCE are NFS4_OK, then the slot's sequence id MUST be
incremented by one. If a requester does not direct the replier to incremented by one. If a requester does not direct the replier to
cache the reply, the replier MUST do one of following: cache the reply, the replier MUST do one of following:
skipping to change at page 51, line 46 skipping to change at page 52, line 40
o The replier enters into its reply cache a reply consisting of the o The replier enters into its reply cache a reply consisting of the
original results to the SEQUENCE or CB_SEQUENCE operation, and original results to the SEQUENCE or CB_SEQUENCE operation, and
with the next operation in COMPOUND or CB)COMPOUND having the with the next operation in COMPOUND or CB)COMPOUND having the
error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later
retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP.
2.10.5.2. Retry and Replay of Reply 2.10.5.2. Retry and Replay of Reply
A requester MUST NOT retry a request, unless the connection it used A requester MUST NOT retry a request, unless the connection it used
to send the request disconnects. The requester can then reconnect to send the request disconnects. The requester can then reconnect
and resend the request, or it can resend the request over a different and re-send the request, or it can re-send the request over a
connection that is associated with the same session. different connection that is associated with the same session.
If the requester is a server wanting to resend a callback operation If the requester is a server wanting to re-send a callback operation
over the backchannel of session, the requester of course cannot over the backchannel of session, the requester of course cannot
reconnect because only the client can associate connections with the reconnect because only the client can associate connections with the
backchannel. The server can resend the request over another backchannel. The server can re-send the request over another
connection that is bound to the same session's backchannel. If there connection that is bound to the same session's backchannel. If there
is no such connection, the server MUST indicate that the session has is no such connection, the server MUST indicate that the session has
no backchannel by setting the SEQ4_STATUS_CB_PATH_DOWN_SESSION flag no backchannel by setting the SEQ4_STATUS_CB_PATH_DOWN_SESSION flag
bit in the response to the next SEQUENCE operation from the client. bit in the response to the next SEQUENCE operation from the client.
The client MUST then associate a connection with the session (or The client MUST then associate a connection with the session (or
destroy the session). destroy the session).
Note that it is not fatal for a client to retry without a disconnect Note that it is not fatal for a client to retry without a disconnect
between the request and retry. However the retry does consume between the request and retry. However the retry does consume
resources, especially with RDMA, where each request, retry or not, resources, especially with RDMA, where each request, retry or not,
consumes a credit. Retries for no reason, especially retries issued consumes a credit. Retries for no reason, especially retries sent
shortly after the previous attempt, are a poor use of network shortly after the previous attempt, are a poor use of network
bandwidth and defeat the purpose of a transport's inherent congestion bandwidth and defeat the purpose of a transport's inherent congestion
control system. control system.
A client MUST wait for a reply to a request before using the slot for A client MUST wait for a reply to a request before using the slot for
another request. If it does not wait for a reply, then the client another request. If it does not wait for a reply, then the client
does not know what sequence id to use for the slot on its next does not know what sequence id to use for the slot on its next
request. For example, suppose a client sends a request with sequence request. For example, suppose a client sends a request with sequence
id 1, and does not wait for the response. The next time it uses the id 1, and does not wait for the response. The next time it uses the
slot, it sends the new request with sequence id 2. If the server has slot, it sends the new request with sequence id 2. If the server has
skipping to change at page 52, line 36 skipping to change at page 53, line 31
expecting sequence id 2, and rejects the client's new request with expecting sequence id 2, and rejects the client's new request with
NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or CB_SEQUENCE). NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or CB_SEQUENCE).
RDMA fabrics do not guarantee that the memory handles (Steering Tags) RDMA fabrics do not guarantee that the memory handles (Steering Tags)
within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that
of a single connection. Therefore, handles used by the direct of a single connection. Therefore, handles used by the direct
operations become invalid after connection loss. The server must operations become invalid after connection loss. The server must
ensure that any RDMA operations which must be replayed from the reply ensure that any RDMA operations which must be replayed from the reply
cache use the newly provided handle(s) from the most recent request. cache use the newly provided handle(s) from the most recent request.
A retry might be issued while the original request is still in A retry might be sent while the original request is still in progress
progress on the replier. The replier SHOULD deal with issue by by on the replier. The replier SHOULD deal with the issue by by
returning NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE returning NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE
operation, but implementations MAY return NFS4ERR_MISORDERED. Since operation, but implementations MAY return NFS4ERR_MISORDERED. Since
errors from SEQUENCE and CB_SEQUENCE are never recorded in the reply errors from SEQUENCE and CB_SEQUENCE are never recorded in the reply
cache, this approach allows the results of the execution of the cache, this approach allows the results of the execution of the
original request to be properly recorded in the reply cache (assuming original request to be properly recorded in the reply cache (assuming
the requester specified the reply to be cached). the requester specified the reply to be cached).
2.10.5.3. Resolving Server Callback Races 2.10.5.3. Resolving Server Callback Races
It is possible for server callbacks to arrive at the client before It is possible for server callbacks to arrive at the client before
skipping to change at page 53, line 48 skipping to change at page 54, line 43
destroyed session), then there is no race with respect to this destroyed session), then there is no race with respect to this
triple. The server SHOULD limit the referring triples to requests triple. The server SHOULD limit the referring triples to requests
that refer to just those that apply to the objects referred to in the that refer to just those that apply to the objects referred to in the
CB_COMPOUND procedure. CB_COMPOUND procedure.
The client must not simply wait forever for the expected server reply The client must not simply wait forever for the expected server reply
to arrive before responding to the CB_COMPOUND that won the race, to arrive before responding to the CB_COMPOUND that won the race,
because it is possible that it will be delayed indefinitely. The because it is possible that it will be delayed indefinitely. The
client should assume the likely case that the reply will arrive client should assume the likely case that the reply will arrive
within the average round trip time for COMPOUND requests to the within the average round trip time for COMPOUND requests to the
server, and wait that period of time. If that period of expires it server, and wait that period of time. If that period of time expires
can respond to the CB_COMPOUND with NFS4ERR_DELAY. it can respond to the CB_COMPOUND with NFS4ERR_DELAY.
There are other scenarios under which callbacks may race replies, There are other scenarios under which callbacks may race replies,
among them pNFS layout recalls, described in Section 12.5.4.2. among them pNFS layout recalls, described in Section 12.5.5.2.
2.10.5.4. COMPOUND and CB_COMPOUND Construction Issues 2.10.5.4. COMPOUND and CB_COMPOUND Construction Issues
Very large requests and replies may pose both buffer management Very large requests and replies may pose both buffer management
issues (especially with RDMA) and reply cache issues. When the issues (especially with RDMA) and reply cache issues. When the
session is created, (Section 18.36), for each channel (fore and session is created, (Section 18.36), for each channel (fore and
back), the client and server negotiate the maximum sized request they back), the client and server negotiate the maximum sized request they
will send or process (ca_maxrequestsize), the maximum sized reply will send or process (ca_maxrequestsize), the maximum sized reply
they will return or process (ca_maxresponsesize), and the maximum they will return or process (ca_maxresponsesize), and the maximum
sized reply they will store in the reply cache sized reply they will store in the reply cache
skipping to change at page 55, line 16 skipping to change at page 56, line 14
A client needs to take care that when sending operations that change A client needs to take care that when sending operations that change
the current filehandle (except for PUTFH, PUTPUBFH, PUTROOTFH and the current filehandle (except for PUTFH, PUTPUBFH, PUTROOTFH and
RESTOREFH) that it not exceed the maximum reply buffer before the RESTOREFH) that it not exceed the maximum reply buffer before the
GETFH operation. Otherwise the client will have to retry the GETFH operation. Otherwise the client will have to retry the
operation that changed the current filehandle, in order to obtain the operation that changed the current filehandle, in order to obtain the
desired filehandle. For the OPEN operation (see Section 18.16), desired filehandle. For the OPEN operation (see Section 18.16),
retry is not always available as an option. The following guidelines retry is not always available as an option. The following guidelines
for the handling of filehandle changing operations are advised: for the handling of filehandle changing operations are advised:
o Within the same COMPOUND procedure, a client SHOULD issue GETFH o Within the same COMPOUND procedure, a client SHOULD send GETFH
immediately after a current filehandle changing operation. A immediately after a current filehandle changing operation. A
client MUST issue GETFH after a current filehandle changing client MUST send GETFH after a current filehandle changing
operation that is also non-idempotent (for example, the OPEN operation that is also non-idempotent (for example, the OPEN
operation), unless the operation is RESTOREFH. RESTOREFH is an operation), unless the operation is RESTOREFH. RESTOREFH is an
exception, because even though it is non-idempotent, the exception, because even though it is non-idempotent, the
filehandle RESTOREFH produced originated from an operation that is filehandle RESTOREFH produced originated from an operation that is
either idempotent (e.g. PUTFH, LOOKUP), or non-idempotent (e.g. either idempotent (e.g. PUTFH, LOOKUP), or non-idempotent (e.g.
OPEN, CREATE). If the origin is non-idempotent, then because the OPEN, CREATE). If the origin is non-idempotent, then because the
client MUST issue GETFH after the origin operation, the client can client MUST send GETFH after the origin operation, the client can
recover if RESTOREFH returns an error. recover if RESTOREFH returns an error.
o A server MAY return NFS4ERR_REP_TOO_BIG or o A server MAY return NFS4ERR_REP_TOO_BIG or
NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a
filehandle changing operation if the reply would be too large on filehandle changing operation if the reply would be too large on
the next operation. the next operation.
o A server SHOULD return NFS4ERR_REP_TOO_BIG or o A server SHOULD return NFS4ERR_REP_TOO_BIG or
NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a
filehandle changing non-idempotent operation if the reply would be filehandle changing non-idempotent operation if the reply would be
skipping to change at page 56, line 35 skipping to change at page 57, line 33
restart: restart:
o The client ID. This is a prerequisite to let the client to create o The client ID. This is a prerequisite to let the client to create
more sessions associated with the same client ID as the more sessions associated with the same client ID as the
o The client ID's sequenceid that is used for creating sessions (see o The client ID's sequenceid that is used for creating sessions (see
Section 18.35 and Section 18.36. This is a prerequisite to let Section 18.35 and Section 18.36. This is a prerequisite to let
the client create more sessions. the client create more sessions.
o The principal that created the client ID. This allows the server o The principal that created the client ID. This allows the server
to authenticate the client when it issues EXCHANGE_ID. to authenticate the client when it sends EXCHANGE_ID.
o The SSV, if SP4_SSV state protection was specified when the client o The SSV, if SP4_SSV state protection was specified when the client
ID was created (see Section 18.35). This lets the client create ID was created (see Section 18.35). This lets the client create
new sessions, and associate connections with the new and existing new sessions, and associate connections with the new and existing
sessions. sessions.
o The properties of the client ID as defined in Section 18.35. o The properties of the client ID as defined in Section 18.35.
A persistent reply cache places certain demands on the server. The A persistent reply cache places certain demands on the server. The
execution of the sequence of operations (starting with SEQUENCE) and execution of the sequence of operations (starting with SEQUENCE) and
skipping to change at page 57, line 19 skipping to change at page 58, line 17
view the problem is as a single transaction consisting of each view the problem is as a single transaction consisting of each
operation in the COMPOUND followed by storing the result in operation in the COMPOUND followed by storing the result in
persistent storage, then finally a transaction commit. If there is a persistent storage, then finally a transaction commit. If there is a
failure before the transaction is committed, then the server rolls failure before the transaction is committed, then the server rolls
back the transaction. If server itself fails, then when it restarts, back the transaction. If server itself fails, then when it restarts,
its recovery logic could roll back the transaction before starting its recovery logic could roll back the transaction before starting
the NFSv4.1 server. the NFSv4.1 server.
While the description of the implementation for atomic execution of While the description of the implementation for atomic execution of
the request and caching of the reply is beyond the scope of this the request and caching of the reply is beyond the scope of this
document, an example implementation for NFS version 2 is described in document, an example implementation for NFSv2 [27] is described in
[28]. [28].
2.10.6. RDMA Considerations 2.10.6. RDMA Considerations
A complete discussion of the operation of RPC-based protocols over A complete discussion of the operation of RPC-based protocols over
RDMA transports is in [8]. A discussion of the operation of NFSv4, RDMA transports is in [8]. A discussion of the operation of NFSv4,
including NFSv4.1, over RDMA is in [9]. Where RDMA is considered, including NFSv4.1, over RDMA is in [9]. Where RDMA is considered,
this specification assumes the use of such a layering; it addresses this specification assumes the use of such a layering; it addresses
only the upper layer issues relevant to making best use of RPC/RDMA. only the upper layer issues relevant to making best use of RPC/RDMA.
skipping to change at page 58, line 33 skipping to change at page 59, line 32
document]]) of [8]; if there are multiple RDMA connections, then the document]]) of [8]; if there are multiple RDMA connections, then the
maximum number of requests for a channel will be divided among the maximum number of requests for a channel will be divided among the
RDMA connections. Put a different way, the onus is on the replier to RDMA connections. Put a different way, the onus is on the replier to
ensure that total number of RDMA credits across all connections ensure that total number of RDMA credits across all connections
associated with the replier's channel does exceed the channel's associated with the replier's channel does exceed the channel's
maximum number of outstanding requests. maximum number of outstanding requests.
The limits may also be modified dynamically at the replier's choosing The limits may also be modified dynamically at the replier's choosing
by manipulating certain parameters present in each NFSv4.1 reply. In by manipulating certain parameters present in each NFSv4.1 reply. In
addition, the CB_RECALL_SLOT callback operation (see Section 20.8) addition, the CB_RECALL_SLOT callback operation (see Section 20.8)
can be issued by a server to a client to return RDMA credits to the can be sent by a server to a client to return RDMA credits to the
server, thereby lowering the maximum number of requests a client can server, thereby lowering the maximum number of requests a client can
have outstanding to the server. have outstanding to the server.
2.10.6.3. Padding 2.10.6.3. Padding
Header padding is requested by each peer at session initiation (see Header padding is requested by each peer at session initiation (see
the ca_headerpadsize argument to CREATE_SESSION in Section 18.36), the ca_headerpadsize argument to CREATE_SESSION in Section 18.36),
and subsequently used by the RPC RDMA layer, as described in [8]. and subsequently used by the RPC RDMA layer, as described in [8].
Zero padding is permitted. Zero padding is permitted.
skipping to change at page 60, line 43 skipping to change at page 61, line 43
(Section 18.33) operations allow the client to specify flavor/ (Section 18.33) operations allow the client to specify flavor/
principal combinations. principal combinations.
Also note that the SP4_SSV state protection mode (see Section 18.35 Also note that the SP4_SSV state protection mode (see Section 18.35
and Section 2.10.7.3) has the side benefit of providing SSV-derived and Section 2.10.7.3) has the side benefit of providing SSV-derived
RPCSEC_GSS contexts (Section 2.10.8). RPCSEC_GSS contexts (Section 2.10.8).
2.10.7.3. Protection from Unauthorized State Changes 2.10.7.3. Protection from Unauthorized State Changes
As described to this point in the specification, the state model of As described to this point in the specification, the state model of
NFSv4.1 is vulnerable to an attacker that issues a SEQUENCE operation NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation
with a forged sessionid and with a slot id that it expects the with a forged sessionid and with a slot id that it expects the
legitimate client to use next. When the legitimate client uses the legitimate client to use next. When the legitimate client uses the
slot id with the same sequence number, the server returns the slot id with the same sequence number, the server returns the
attacker's result from the reply cache which disrupts the legitimate attacker's result from the reply cache which disrupts the legitimate
client and thus denies service to it. Similarly an attacker could client and thus denies service to it. Similarly an attacker could
issue a CREATE_SESSION with a forged client ID to create a new send a CREATE_SESSION with a forged client ID to create a new session
session associated with the client ID. The attacker could issue associated with the client ID. The attacker could send requests
requests using the new session that change locking state, such as using the new session that change locking state, such as LOCKU
LOCKU operations to release locks the legitimate client has acquired. operations to release locks the legitimate client has acquired.
Setting a security policy on the file which requires RPCSEC_GSS Setting a security policy on the file which requires RPCSEC_GSS
credentials when manipulating the file's state is one potential work credentials when manipulating the file's state is one potential work
around, but has the disadvantage of preventing a legitimate client around, but has the disadvantage of preventing a legitimate client
from releasing state when RPCSEC_GSS is required to do so, but a GSS from releasing state when RPCSEC_GSS is required to do so, but a GSS
context cannot be obtained (possibly because the user has logged off context cannot be obtained (possibly because the user has logged off
the client). the client).
NFSv4.1 provides three options to a client for state protection which NFSv4.1 provides three options to a client for state protection which
are specified when a client creates a client ID via EXCHANGE_ID are specified when a client creates a client ID via EXCHANGE_ID
skipping to change at page 62, line 36 skipping to change at page 63, line 36
The SP4_SSV protection option uses a Secret State Verifier (SSV) The SP4_SSV protection option uses a Secret State Verifier (SSV)
which is shared between a client and server. The SSV serves as the which is shared between a client and server. The SSV serves as the
secret key for an internal (that is, internal to NFSv4.1) GSS secret key for an internal (that is, internal to NFSv4.1) GSS
mechanism that uses the secret key for Message Integrity Code (MIC) mechanism that uses the secret key for Message Integrity Code (MIC)
and Wrap tokens (Section 2.10.8). The SP4_SSV protection option is and Wrap tokens (Section 2.10.8). The SP4_SSV protection option is
intended for the client that has multiple users, and the system intended for the client that has multiple users, and the system
administrator does not wish to configure a permanent machine administrator does not wish to configure a permanent machine
credential for each client. The SSV is established on the server via credential for each client. The SSV is established on the server via
SET_SSV (see Section 18.47). To prevent eavesdropping, a client SET_SSV (see Section 18.47). To prevent eavesdropping, a client
SHOULD issue SET_SSV via RPCSEC_GSS with the privacy service. SHOULD send SET_SSV via RPCSEC_GSS with the privacy service. Several
Several aspects of the SSV make it intractable for an attacker to aspects of the SSV make it intractable for an attacker to guess the
guess the SSV, and thus associate rogue connections with a session, SSV, and thus associate rogue connections with a session, and rogue
and rogue sessions with a client ID: sessions with a client ID:
o The arguments to and results of SET_SSV include digests of the old o The arguments to and results of SET_SSV include digests of the old
and new SSV, respectively. and new SSV, respectively.
o Because the initial value of the SSV is zero, therefore known, the o Because the initial value of the SSV is zero, therefore known, the
client that opts for SP4_SSV protection and opts to apply SP4_SSV client that opts for SP4_SSV protection and opts to apply SP4_SSV
protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST issue protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST send at
at least one SET_SSV operation before the first least one SET_SSV operation before the first BIND_CONN_TO_SESSION
BIND_CONN_TO_SESSION operation or before the second CREATE_SESSION operation or before the second CREATE_SESSION operation on a
operation on a client ID. If it does not, the SSV mechanism will client ID. If it does not, the SSV mechanism will not generate
not generate tokens (Section 2.10.8). A client SHOULD issue tokens (Section 2.10.8). A client SHOULD send SET_SSV as soon as
SET_SSV as soon as a session is created. a session is created.
o A SET_SSV does not replace the SSV with the argument to SET_SSV. o A SET_SSV does not replace the SSV with the argument to SET_SSV.
Instead, the current SSV on the server is logically exclusive ORed Instead, the current SSV on the server is logically exclusive ORed
(XORed) with the argument to SET_SSV. Each time a new principal (XORed) with the argument to SET_SSV. Each time a new principal
uses a client ID for the first time, the client SHOULD issue a uses a client ID for the first time, the client SHOULD send a
SET_SSV with that principal's RPCSEC_GSS credentials, with SET_SSV with that principal's RPCSEC_GSS credentials, with
RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY. RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY.
Here are the types of attacks that can be attempted by an attacker Here are the types of attacks that can be attempted by an attacker
named Eve on a victim named Bob, and how SP4_SSV protection foils named Eve on a victim named Bob, and how SP4_SSV protection foils
each attack: each attack:
o Suppose Eve is the first user to log into a legitimate client. o Suppose Eve is the first user to log into a legitimate client.
Eve's use of an NFSv4.1 file system will cause an SSV to be Eve's use of an NFSv4.1 file system will cause an SSV to be
created via the legitimate client's NFSv4.1 implementation. The created via the legitimate client's NFSv4.1 implementation. The
SET_SSV that creates the SSV will be protected by the RPCSEC_GSS SET_SSV that creates the SSV will be protected by the RPCSEC_GSS
context created by the legitimate client which uses Eve's GSS context created by the legitimate client which uses Eve's GSS
principal and credentials. Eve can eavesdrop on the network while principal and credentials. Eve can eavesdrop on the network while
her RPCSEC_GSS context is created, and the SET_SSV using her her RPCSEC_GSS context is created, and the SET_SSV using her
context is issued. Even if the legitimate client issues the context is sent. Even if the legitimate client sends the SET_SSV
SET_SSV with RPC_GSS_SVC_PRIVACY, because Eve knows her own with RPC_GSS_SVC_PRIVACY, because Eve knows her own credentials,
credentials, she can decrypt the SSV. Eve can compute an she can decrypt the SSV. Eve can compute an RPCSEC_GSS credential
RPCSEC_GSS credential that BIND_CONN_TO_SESSION will accept, and that BIND_CONN_TO_SESSION will accept, and so associate a new
so associate a new connection with the legitimate session. Eve connection with the legitimate session. Eve can change the slot
can change the slot id and sequence state of a legitimate session, id and sequence state of a legitimate session, and/or the SSV
and/or the SSV state, in such a way that when Bob accesses the state, in such a way that when Bob accesses the server via the
server via the same legitimate client, the legitimate client will same legitimate client, the legitimate client will be unable to
be unable to use the session. use the session.
The client's only recourse is to create a new client ID for Bob to The client's only recourse is to create a new client ID for Bob to
use, and establish a new SSV for the client ID. The client will use, and establish a new SSV for the client ID. The client will
be unable to delete the old client ID, and will let the lease on be unable to delete the old client ID, and will let the lease on
old client ID expire. old client ID expire.
Once the legitimate client establishes an SSV over the new session Once the legitimate client establishes an SSV over the new session
using Bob's RPCSEC_GSS context, Eve can use the new session via using Bob's RPCSEC_GSS context, Eve can use the new session via
the legitimate client, but she cannot disrupt Bob. Moreover, the legitimate client, but she cannot disrupt Bob. Moreover,
because the client SHOULD have modified the SSV due to Eve using because the client SHOULD have modified the SSV due to Eve using
the new session, Bob cannot get revenge on Eve by associating a the new session, Bob cannot get revenge on Eve by associating a
rogue connection with the session. rogue connection with the session.
The question is how did the legitimate client detect that Eve has The question is how did the legitimate client detect that Eve has
hijacked the old session? When the client detects that a new hijacked the old session? When the client detects that a new
principal, Bob, wants to use the session, it SHOULD have issued a principal, Bob, wants to use the session, it SHOULD have sent a
SET_SSV, which leads to following sub-scenarios: SET_SSV, which leads to following sub-scenarios:
* Let us suppose that from the rogue connection, Eve issued a * Let us suppose that from the rogue connection, Eve sent a
SET_SSV with the same slot id and sequence that the legitimate SET_SSV with the same slot id and sequence that the legitimate
client later uses. The server will assume this is a retry, and client later uses. The server will assume this is a retry, and
return to the legitimate client the reply it sent Eve. However, return to the legitimate client the reply it sent Eve. However,
unless Eve can correctly guess the SSV the legitimate client unless Eve can correctly guess the SSV the legitimate client
will use, the digest verification checks in the SET_SSV will use, the digest verification checks in the SET_SSV
response will fail. That is an indication to the client that response will fail. That is an indication to the client that
the session has apparently been hijacked. the session has apparently been hijacked.
* Alternatively, Eve issued a SET_SSV with a different slot id * Alternatively, Eve sent a SET_SSV with a different slot id than
than the legitimate client uses for its SET_SSV. Then the the legitimate client uses for its SET_SSV. Then the digest
digest verification on the server fails, and it is again verification on the server fails, and it is again apparent to
apparent to the client that the session has been hijacked. the client that the session has been hijacked.
* Alternatively, Eve issued an operation other than SET_SSV, but * Alternatively, Eve sent an operation other than SET_SSV, but
with the same slot id and sequence that the legitimate client with the same slot id and sequence that the legitimate client
uses for its SET_SSV. The server returns to the legitimate uses for its SET_SSV. The server returns to the legitimate
client the response it sent Eve. The client sees that the client the response it sent Eve. The client sees that the
response is not at all what it expects. The client assumes response is not at all what it expects. The client assumes
either session hijacking or a server bug, and either way either session hijacking or a server bug, and either way
destroys the old session. destroys the old session.
o Eve associates a rogue connection with the session as above, and o Eve associates a rogue connection with the session as above, and
then destroys the session. Again, Bob goes to use the server from then destroys the session. Again, Bob goes to use the server from
the legitimate client by issuing a SET_SSV. The client receives the legitimate client by issuing a SET_SSV. The client receives
an error that indicates the session does not exist. When the an error that indicates the session does not exist. When the
client tries to create a new session, this will fail because the client tries to create a new session, this will fail because the
SSV it has does not that the server has, and now the client knows SSV it has does not that the server has, and now the client knows
the session was hijacked. The legitimate client establishes a new the session was hijacked. The legitimate client establishes a new
client ID as before. client ID as before.
o If Eve creates a connection before the legitimate client o If Eve creates a connection before the legitimate client
establishes an SSV, because the initial value of the SSV is zero establishes an SSV, because the initial value of the SSV is zero
and therefore known, Eve can issue a SET_SSV that will pass the and therefore known, Eve can send a SET_SSV that will pass the
digest verification check. However because the new connection has digest verification check. However because the new connection has
not been associated with the session, the SET_SSV is rejected for not been associated with the session, the SET_SSV is rejected for
that reason. that reason.
In summary an attacker's disruption of state when SP4_SSV protection In summary an attacker's disruption of state when SP4_SSV protection
is in use is limited to the formative period of a client ID, its is in use is limited to the formative period of a client ID, its
first session, and the establishment of the SSV. Once a non- first session, and the establishment of the SSV. Once a non-
malicious user uses the client ID, the client quickly detects any malicious user uses the client ID, the client quickly detects any
hijack and rectifies the situation. Once a non-malicious user hijack and rectifies the situation. Once a non-malicious user
successfully modifies the SSV, the attacker cannot use NFSv4.1 successfully modifies the SSV, the attacker cannot use NFSv4.1
skipping to change at page 65, line 33 skipping to change at page 66, line 33
via the RPCSEC_GSS protocol. Instead, the contexts are automatically via the RPCSEC_GSS protocol. Instead, the contexts are automatically
created when EXCHANGE_ID specifies SP4_SSV protection. The only created when EXCHANGE_ID specifies SP4_SSV protection. The only
tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the
SealedMessage (emitted by GSS_Wrap). SealedMessage (emitted by GSS_Wrap).
The mechanism OID for the SSV mechanism is: The mechanism OID for the SSV mechanism is:
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech
(1.3.6.1.4.1.28882.1.1). While the SSV mechanisms does not define (1.3.6.1.4.1.28882.1.1). While the SSV mechanisms does not define
any initial context tokens, the OID can be used to let servers any initial context tokens, the OID can be used to let servers
indicate that the SSV mechanism is acceptable whenever the client indicate that the SSV mechanism is acceptable whenever the client
issues a SECINFO or SECINFO_NO_NAME operation (see Section 2.6). sends a SECINFO or SECINFO_NO_NAME operation (see Section 2.6).
The SSV mechanism defines four subkeys derived from the SSV value. The SSV mechanism defines four subkeys derived from the SSV value.
Each time SET_SSV is invoked the subkeys are recalculated by the Each time SET_SSV is invoked the subkeys are recalculated by the
client and server. The four subkeys are calculated by from each of client and server. The four subkeys are calculated by from each of
the valid ssv_subkey4 enumerated values. The calculation uses the the valid ssv_subkey4 enumerated values. The calculation uses the
HMAC ([11]), algorithm, using the current SSV as the key, the one way HMAC ([11]), algorithm, using the current SSV as the key, the one way
hash algorithm as negotiated by EXCHANGE_ID, and the input text as hash algorithm as negotiated by EXCHANGE_ID, and the input text as
represented by the XDR encoded enumeration of type ssv_subkey4. represented by the XDR encoded enumeration of type ssv_subkey4.
/* Input for computing subkeys */ /* Input for computing subkeys */
skipping to change at page 68, line 40 skipping to change at page 69, line 40
confidentiality. This is because the SSV mechanism is for confidentiality. This is because the SSV mechanism is for
RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without
confidentiality. confidentiality.
Effectively there is a single GSS context for a single client ID. Effectively there is a single GSS context for a single client ID.
All RPCSEC_GSS handles share the same GSS context. SSV GSS contexts All RPCSEC_GSS handles share the same GSS context. SSV GSS contexts
do not expire except when the SSV is destroyed (causes would include do not expire except when the SSV is destroyed (causes would include
the client ID being destroyed or a server restart). Since one the client ID being destroyed or a server restart). Since one
purpose of context expiration is to replace keys that have been in purpose of context expiration is to replace keys that have been in
use for "too long" hence vulnerable to compromise by brute force or use for "too long" hence vulnerable to compromise by brute force or
accident, the client can issue periodic SET_SSV operations, by accident, the client can send periodic SET_SSV operations, by cycling
cycling through different users' RPCSEC_GSS credentials. This way through different users' RPCSEC_GSS credentials. This way the SSV is
the SSV is replaced without destroying the SSV's GSS contexts. replaced without destroying the SSV's GSS contexts.
SSV RPCSEC_GSS handles can be expired or deleted by the server at any SSV RPCSEC_GSS handles can be expired or deleted by the server at any
time and the EXCHANGE_ID operation can be used to create more SSV time and the EXCHANGE_ID operation can be used to create more SSV
RPCSEC_GSS handles. RPCSEC_GSS handles.
The client MUST establish an SSV via SET_SSV before the SSV GSS The client MUST establish an SSV via SET_SSV before the SSV GSS
context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC().
If SET_SSV has not been successfully called, attempts to emit tokens If SET_SSV has not been successfully called, attempts to emit tokens
MUST fail. MUST fail.
skipping to change at page 70, line 7 skipping to change at page 71, line 7
is not being used for the fore channel, there is no way the client is not being used for the fore channel, there is no way the client
tell if the connection is still alive (e.g., the server restarted tell if the connection is still alive (e.g., the server restarted
without sending a disconnect). The onus is on the server, not the without sending a disconnect). The onus is on the server, not the
client, to determine if the backchannel's connection is alive, and client, to determine if the backchannel's connection is alive, and
to indicate in the response to a SEQUENCE operation when the last to indicate in the response to a SEQUENCE operation when the last
connection associated with a session's backchannel has connection associated with a session's backchannel has
disconnected. disconnected.
2.10.9.3. Steps the Client Takes To Establish a Session 2.10.9.3. Steps the Client Takes To Establish a Session
If the client does not have a client ID, the client issues If the client does not have a client ID, the client sends EXCHANGE_ID
EXCHANGE_ID to establish a client ID. If it opts for SP4_MACH_CRED to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV
or SP4_SSV protection, in the spo_must_enforce list of operations, it protection, in the spo_must_enforce list of operations, it SHOULD at
SHOULD at minimum specify: CREATE_SESSION, DESTROY_SESSION, minimum specify: CREATE_SESSION, DESTROY_SESSION,
BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts
for SP4_SSV protection, the client needs to ask for SSV-based for SP4_SSV protection, the client needs to ask for SSV-based
RPCSEC_GSS handles. RPCSEC_GSS handles.
The client uses the client ID to issue a CREATE_SESSION on a The client uses the client ID to send a CREATE_SESSION on a
connection to the server. The results of CREATE_SESSION indicate connection to the server. The results of CREATE_SESSION indicate
whether the server will persist the session reply cache through a whether the server will persist the session reply cache through a
server restarted or not, and the client notes this for future server restarted or not, and the client notes this for future
reference. reference.
If the client specified SP4_SSV state protection when the client ID If the client specified SP4_SSV state protection when the client ID
was created, then it SHOULD issue SET_SSV in the first COMPOUND after was created, then it SHOULD send SET_SSV in the first COMPOUND after
the session is created. Each time a new principal goes to use the the session is created. Each time a new principal goes to use the
client ID, it SHOULD issue a SET_SSV again. client ID, it SHOULD send a SET_SSV again.
If the client wants to use delegations, layouts, directory If the client wants to use delegations, layouts, directory
notifications, or any other state that requires a backchannel, then notifications, or any other state that requires a backchannel, then
it must add a connection to the backchannel if CREATE_SESSION did not it must add a connection to the backchannel if CREATE_SESSION did not
already do so. The client creates a connection, and calls already do so. The client creates a connection, and calls
BIND_CONN_TO_SESSION to associate the connection with the session and BIND_CONN_TO_SESSION to associate the connection with the session and
the session's backchannel. If CREATE_SESSION did not already do so, the session's backchannel. If CREATE_SESSION did not already do so,
the client MUST tell the server what security is required in order the client MUST tell the server what security is required in order
for the client to accept callbacks. The client does this via for the client to accept callbacks. The client does this via
BACKCHANNEL_CTL. If the client selected SP4_MACH_CRED or SP4_SSV BACKCHANNEL_CTL. If the client selected SP4_MACH_CRED or SP4_SSV
skipping to change at page 72, line 28 skipping to change at page 73, line 28
refers to the lost session. refers to the lost session.
After an event like a server restart, the client may have lost its After an event like a server restart, the client may have lost its
connections. The client assumes for the moment that the session has connections. The client assumes for the moment that the session has
not been lost. It reconnects, and if it specified connection not been lost. It reconnects, and if it specified connection
association enforcement when the session was created, it invokes association enforcement when the session was created, it invokes
BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes
SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns
NFS4ERR_BADSESSION, the client knows the session was lost. If the NFS4ERR_BADSESSION, the client knows the session was lost. If the
connection survives session loss, then the next SEQUENCE operation connection survives session loss, then the next SEQUENCE operation
the client issues over the connection will get back the client sends over the connection will get back
NFS4ERR_BADSESSION. The client again knows the session was lost. NFS4ERR_BADSESSION. The client again knows the session was lost.
When the client detects session loss, it must call CREATE_SESSION to When the client detects session loss, it must call CREATE_SESSION to
recover. Any non-idempotent operations that were in progress may recover. Any non-idempotent operations that were in progress may
have been performed on the server at the time of session loss. The have been performed on the server at the time of session loss. The
client has no general way to recover from this. client has no general way to recover from this.
Note that loss of session does not imply loss of lock, open, Note that loss of session does not imply loss of lock, open,
delegation, or layout state because locks, opens, delegations, and delegation, or layout state because locks, opens, delegations, and
layouts are tied to the client ID and depend on the client ID, not layouts are tied to the client ID and depend on the client ID, not
skipping to change at page 73, line 11 skipping to change at page 74, line 11
(for example the server restarts and does not preserve client ID (for example the server restarts and does not preserve client ID
state). If so, the client needs to call EXCHANGE_ID, followed by state). If so, the client needs to call EXCHANGE_ID, followed by
CREATE_SESSION. CREATE_SESSION.
2.10.10.2. Events Requiring Server Action 2.10.10.2. Events Requiring Server Action
The following events require server action to recover. The following events require server action to recover.
2.10.10.2.1. Client Crash and Restart 2.10.10.2.1. Client Crash and Restart
As described in Section 18.35, a restarted client issues EXCHANGE_ID As described in Section 18.35, a restarted client sends EXCHANGE_ID
in such a way it causes the server to delete any sessions it had. in such a way it causes the server to delete any sessions it had.
2.10.10.2.2. Client Crash with No Restart 2.10.10.2.2. Client Crash with No Restart
If a client crashes and never comes back, it will never issue If a client crashes and never comes back, it will never send
EXCHANGE_ID with its old client owner. Thus the server has session EXCHANGE_ID with its old client owner. Thus the server has session
state that will never be used again. After an extended period of state that will never be used again. After an extended period of
time and if the server has resource constraints, it MAY destroy the time and if the server has resource constraints, it MAY destroy the
old session as well as locking state. old session as well as locking state.
2.10.10.2.3. Extended Network Partition 2.10.10.2.3. Extended Network Partition
To the server, the extended network partition may be no different To the server, the extended network partition may be no different
from a client crash with no restart (see Section 2.10.10.2.2). from a client crash with no restart (see Section 2.10.10.2.2).
Unless the server can discern that there is a network partition, it Unless the server can discern that there is a network partition, it
skipping to change at page 74, line 27 skipping to change at page 75, line 27
A client and server can potentially be a non-pNFS implementation, a A client and server can potentially be a non-pNFS implementation, a
metadata server implementation, a data server implementation, or two metadata server implementation, a data server implementation, or two
or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS,
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not
mutually exclusive) are passed in the EXCHANGE_ID arguments and mutually exclusive) are passed in the EXCHANGE_ID arguments and
results to allow the client to indicate how it wants to use sessions results to allow the client to indicate how it wants to use sessions
created under the client ID, and to allow the server to indicate how created under the client ID, and to allow the server to indicate how
it will allow the sessions to be used. See Section 13.1 for pNFS it will allow the sessions to be used. See Section 13.1 for pNFS
sessions considerations. sessions considerations.
3. Protocol Data Types 3. Protocol Constants and Data Types
The syntax and semantics to describe the data types of the NFS The syntax and semantics to describe the data types of the NFSv4.1
version 4 protocol are defined in the XDR RFC4506 [2] and RPC RFC1831 protocol are defined in the XDR RFC4506 [2] and RPC RFC1831 [3]
[3] documents. The next sections build upon the XDR data types to documents. The next sections build upon the XDR data types to define
define types and structures specific to this protocol. constants, types and structures specific to this protocol.
3.1. Basic Data Types 3.1. Basic Constants
These are the base NFSv4 data types. const NFS4_FHSIZE = 128;
const NFS4_VERIFIER_SIZE = 8;
const NFS4_OPAQUE_LIMIT = 1024;
const NFS4_SESSIONID_SIZE = 16;
const NFS4_INT64_MAX = 0x7fffffffffffffff;
const NFS4_UINT64_MAX = 0xffffffffffffffff;
const NFS4_INT32_MAX = 0x7fffffff;
const NFS4_UINT32_MAX = 0xffffffff;
const NFS4_MAXFILELEN = 0xffffffffffffffff;
const NFS4_MAXFILEOFF = 0xfffffffffffffffe;
Except where noted, all these constants are defined in bytes.
o NFS4_FHSIZE is the maximum size of a filehandle.
o NFS4_VERIFIER_SIZE is the fixed size of a verifier.
o NFS4_OPAQUE_LIMIT is the maximum size of certain opaque
information.
o NFS4_SESSIONID_SIZE is the fixed size of a session identifier.
o NFS4_INT64_MAX is the maximum value of a signed 64 bit integer.
o NFS4_UINT64_MAX is the maximum value of an unsigned 64 bit
integer.
o NFS4_INT32_MAX is the maximum value of a signed 32 bit integer.
o NFS4_UINT32_MAX is the maximum value of an unsigned 32 bit
integer.
o NFS4_MAXFILELEN is the maximum length of a regular file.
o NFS4_MAXFILEOFF is the maximum offset into a regular file.
3.2. Basic Data Types
These are the base NFSv4.1 data types.
+----------------------+--------------------------------------------+ +----------------------+--------------------------------------------+
| Data Type | Definition | | Data Type | Definition |
+----------------------+--------------------------------------------+ +----------------------+--------------------------------------------+
| int32_t | typedef int int32_t; | | int32_t | typedef int int32_t; |
| uint32_t | typedef unsigned int uint32_t; | | uint32_t | typedef unsigned int uint32_t; |
| int64_t | typedef hyper int64_t; | | int64_t | typedef hyper int64_t; |
| uint64_t | typedef unsigned hyper uint64_t; | | uint64_t | typedef unsigned hyper uint64_t; |
| attrlist4<> | typedef opaque attrlist4<>; | | attrlist4<> | typedef opaque attrlist4<>; |
| | Used for file/directory attributes | | | Used for file/directory attributes |
skipping to change at page 75, line 16 skipping to change at page 77, line 9
| count4 | typedef uint32_t count4; | | count4 | typedef uint32_t count4; |
| | Various count parameters (READ, WRITE, | | | Various count parameters (READ, WRITE, |
| | COMMIT) | | | COMMIT) |
| length4 | typedef uint64_t length4; | | length4 | typedef uint64_t length4; |
| | Describes LOCK lengths | | | Describes LOCK lengths |
| mode4 | typedef uint32_t mode4; | | mode4 | typedef uint32_t mode4; |
| | Mode attribute data type | | | Mode attribute data type |
| nfs_cookie4 | typedef uint64_t nfs_cookie4; | | nfs_cookie4 | typedef uint64_t nfs_cookie4; |
| | Opaque cookie value for READDIR | | | Opaque cookie value for READDIR |
| nfs_fh4<NFS4_FHSIZE> | typedef opaque nfs_fh4<NFS4_FHSIZE>; | | nfs_fh4<NFS4_FHSIZE> | typedef opaque nfs_fh4<NFS4_FHSIZE>; |
| | Filehandle definition; NFS4_FHSIZE is | | | Filehandle definition |
| | defined as 128 |
| nfs_ftype4 | enum nfs_ftype4; | | nfs_ftype4 | enum nfs_ftype4; |
| | Various defined file types | | | Various defined file types |
| nfsstat4 | enum nfsstat4; | | nfsstat4 | enum nfsstat4; |
| | Return value for operations | | | Return value for operations |
| offset4 | typedef uint64_t offset4; | | offset4 | typedef uint64_t offset4; |
| | Various offset designations (READ, WRITE, | | | Various offset designations (READ, WRITE, |
| | LOCK, COMMIT) | | | LOCK, COMMIT) |
| qop4 | typedef uint32_t qop4; | | qop4 | typedef uint32_t qop4; |
| | Quality of protection designation in | | | Quality of protection designation in |
| | SECINFO | | | SECINFO |
skipping to change at page 75, line 41 skipping to change at page 77, line 33
| | it contains an ASN.1 OBJECT IDENTIFIER as | | | it contains an ASN.1 OBJECT IDENTIFIER as |
| | used by GSS-API in the mech_type argument | | | used by GSS-API in the mech_type argument |
| | to GSS_Init_sec_context. See [7] for | | | to GSS_Init_sec_context. See [7] for |
| | details. | | | details. |
| sequenceid4 | typedef uint32_t sequenceid4; | | sequenceid4 | typedef uint32_t sequenceid4; |
| | sequence number used for various session | | | sequence number used for various session |
| | operations (EXCHANGE_ID, CREATE_SESSION, | | | operations (EXCHANGE_ID, CREATE_SESSION, |
| | SEQUENCE, CB_SEQUENCE). | | | SEQUENCE, CB_SEQUENCE). |
| seqid4 | typedef uint32_t seqid4; | | seqid4 | typedef uint32_t seqid4; |
| | Sequence identifier used for file locking | | | Sequence identifier used for file locking |
| sessionid4 | typedef opaque sessionid4[16]; | | sessionid4 | typedef opaque |
| | sessionid4[NFS4_SESSIONID_SIZE]; |
| | Session identifier | | | Session identifier |
| slotid4 | typedef uint32_t slotid4; | | slotid4 | typedef uint32_t slotid4; |
| | sequencing artifact for various session | | | sequencing artifact for various session |
| | operations (SEQUENCE, CB_SEQUENCE). | | | operations (SEQUENCE, CB_SEQUENCE). |
| utf8string<> | typedef opaque utf8string<>; | | utf8string<> | typedef opaque utf8string<>; |
| | UTF-8 encoding for strings | | | UTF-8 encoding for strings |
| utf8str_cis | typedef utf8string utf8str_cis; | | utf8str_cis | typedef utf8string utf8str_cis; |
| | Case-insensitive UTF-8 string | | | Case-insensitive UTF-8 string |
| utf8str_cs | typedef utf8string utf8str_cs; | | utf8str_cs | typedef utf8string utf8str_cs; |
| | Case-sensitive UTF-8 string | | | Case-sensitive UTF-8 string |
skipping to change at page 76, line 25 skipping to change at page 78, line 18
| | Verifier used for various operations | | | Verifier used for various operations |
| | (COMMIT, CREATE, EXCHANGE_ID, OPEN, | | | (COMMIT, CREATE, EXCHANGE_ID, OPEN, |
| | READDIR, WRITE) NFS4_VERIFIER_SIZE is | | | READDIR, WRITE) NFS4_VERIFIER_SIZE is |
| | defined as 8. | | | defined as 8. |
+----------------------+--------------------------------------------+ +----------------------+--------------------------------------------+
End of Base Data Types End of Base Data Types
Table 1 Table 1
3.2. Structured Data Types 3.3. Structured Data Types
3.2.1. nfstime4 3.3.1. nfstime4
struct nfstime4 { struct nfstime4 {
int64_t seconds; int64_t seconds;
uint32_t nseconds; uint32_t nseconds;
}; };
The nfstime4 structure gives the number of seconds and nanoseconds The nfstime4 structure gives the number of seconds and nanoseconds
since midnight or 0 hour January 1, 1970 Coordinated Universal Time since midnight or 0 hour January 1, 1970 Coordinated Universal Time
(UTC). Values greater than zero for the seconds field denote dates (UTC). Values greater than zero for the seconds field denote dates
after the 0 hour January 1, 1970. Values less than zero for the after the 0 hour January 1, 1970. Values less than zero for the
skipping to change at page 77, line 5 skipping to change at page 78, line 46
nseconds fields would have a value of one-half second (500000000). nseconds fields would have a value of one-half second (500000000).
Values greater than 999,999,999 for nseconds are considered invalid. Values greater than 999,999,999 for nseconds are considered invalid.
This data type is used to pass time and date information. A server This data type is used to pass time and date information. A server
converts to and from its local representation of time when processing converts to and from its local representation of time when processing
time values, preserving as much accuracy as possible. If the time values, preserving as much accuracy as possible. If the
precision of timestamps stored for a file system object is less than precision of timestamps stored for a file system object is less than
defined, loss of precision can occur. An adjunct time maintenance defined, loss of precision can occur. An adjunct time maintenance
protocol is recommended to reduce client and server time skew. protocol is recommended to reduce client and server time skew.
3.2.2. time_how4 3.3.2. time_how4
enum time_how4 { enum time_how4 {
SET_TO_SERVER_TIME4 = 0, SET_TO_SERVER_TIME4 = 0,
SET_TO_CLIENT_TIME4 = 1 SET_TO_CLIENT_TIME4 = 1
}; };
3.2.3. settime4 3.3.3. settime4
union settime4 switch (time_how4 set_it) { union settime4 switch (time_how4 set_it) {
case SET_TO_CLIENT_TIME4: case SET_TO_CLIENT_TIME4:
nfstime4 time; nfstime4 time;
default: default:
void; void;
}; };
The above definitions are used as the attribute definitions to set The above definitions are used as the attribute definitions to set
time values. If set_it is SET_TO_SERVER_TIME4, then the server uses time values. If set_it is SET_TO_SERVER_TIME4, then the server uses
its local representation of time for the time value. its local representation of time for the time value.
3.2.4. specdata4 3.3.4. specdata4
struct specdata4 { struct specdata4 {
uint32_t specdata1; /* major device number */ uint32_t specdata1; /* major device number */
uint32_t specdata2; /* minor device number */ uint32_t specdata2; /* minor device number */
}; };
This data type represents additional information for the device file This data type represents additional information for the device file
types NF4CHR and NF4BLK. types NF4CHR and NF4BLK.
3.2.5. fsid4 3.3.5. fsid4
struct fsid4 { struct fsid4 {
uint64_t major; uint64_t major;
uint64_t minor; uint64_t minor;
}; };
3.2.6. chg_policy4 3.3.6. chg_policy4
struct change_policy4 { struct change_policy4 {
uint64_t cp_major; uint64_t cp_major;
uint64_t cp_minor; uint64_t cp_minor;
}; };
The chg_policy4 data type is used for the change_policy recommended The chg_policy4 data type is used for the change_policy recommended
attribute. It provides change sequencing indication analogous to the attribute. It provides change sequencing indication analogous to the
change attribute. To enable the server to present a value valid change attribute. To enable the server to present a value valid
across server re-initialization without requiring persistent storage, across server re-initialization without requiring persistent storage,
two 64-bit quantities are used, allowing one to be a server instance two 64-bit quantities are used, allowing one to be a server instance
id and the second to be incremented non-persistently, within a given id and the second to be incremented non-persistently, within a given
server instance. server instance.
3.2.7. fs_location4 3.3.7. fattr4
struct fs_location4 {
utf8str_cis server<>;
pathname4 rootpath;
};
3.2.8. fs_locations4
struct fs_locations4 {
pathname4 fs_root;
fs_location4 locations<>;
};
The fs_location4 and fs_locations4 data types are used for the
fs_locations recommended attribute which is used for migration and
replication support.
3.2.9. fattr4
struct fattr4 { struct fattr4 {
bitmap4 attrmask; bitmap4 attrmask;
attrlist4 attr_vals; attrlist4 attr_vals;
}; };
The fattr4 structure is used to represent file and directory The fattr4 structure is used to represent file and directory
attributes. attributes.
The bitmap is a counted array of 32 bit integers used to contain bit The bitmap is a counted array of 32 bit integers used to contain bit
values. The position of the integer in the array that contains bit n values. The position of the integer in the array that contains bit n
can be computed from the expression (n / 32) and its bit within that can be computed from the expression (n / 32) and its bit within that
integer is (n mod 32). integer is (n mod 32).
0 1 0 1
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
| count | 31 .. 0 | 63 .. 32 | | count | 31 .. 0 | 63 .. 32 |
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
3.2.10. change_info4 3.3.8. change_info4
struct change_info4 { struct change_info4 {
bool atomic; bool atomic;
changeid4 before; changeid4 before;
changeid4 after; changeid4 after;
}; };
This structure is used with the CREATE, LINK, REMOVE, RENAME This structure is used with the CREATE, LINK, REMOVE, RENAME
operations to let the client know the value of the change attribute operations to let the client know the value of the change attribute
for the directory in which the target file system object resides. for the directory in which the target file system object resides.
3.2.11. netaddr4 3.3.9. netaddr4
struct netaddr4 { struct netaddr4 {
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string na_r_netid<>; /* network id */ string na_r_netid<>; /* network id */
string na_r_addr<>; /* universal address */ string na_r_addr<>; /* universal address */
}; };
The netaddr4 structure is used to identify TCP/IP based endpoints. The netaddr4 structure is used to identify TCP/IP based endpoints.
The r_netid and r_addr fields are specified in RFC1833 [26], but they The r_netid and r_addr fields are specified in RFC1833 [25], but they
are underspecified in RFC1833 [26] as far as what they should look are underspecified in RFC1833 [25] as far as what they should look
like for specific protocols. like for specific protocols.
For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the
US-ASCII string: US-ASCII string:
h1.h2.h3.h4.p1.p2 h1.h2.h3.h4.p1.p2
The prefix, "h1.h2.h3.h4", is the standard textual form for The prefix, "h1.h2.h3.h4", is the standard textual form for
representing an IPv4 address, which is always four bytes long. representing an IPv4 address, which is always four bytes long.
Assuming big-endian ordering, h1, h2, h3, and h4, are respectively, Assuming big-endian ordering, h1, h2, h3, and h4, are respectively,
skipping to change at page 80, line 20 skipping to change at page 81, line 41
representing an IPv6 address as defined in Section 2.2 of RFC2373 representing an IPv6 address as defined in Section 2.2 of RFC2373
[12]. Additionally, the two alternative forms specified in Section [12]. Additionally, the two alternative forms specified in Section
2.2 of RFC2373 [12] are also acceptable. 2.2 of RFC2373 [12] are also acceptable.
For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP
over IPv6 the value of r_netid is the string "udp6". That this over IPv6 the value of r_netid is the string "udp6". That this
document specifies the universal address and netid for UDP/IPv6 does document specifies the universal address and netid for UDP/IPv6 does
not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see
Section 2.9). Section 2.9).
3.2.12. state_owner4 3.3.10. state_owner4
struct state_owner4 { struct state_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
typedef state_owner4 open_owner4; typedef state_owner4 open_owner4;
typedef state_owner4 lock_owner4; typedef state_owner4 lock_owner4;
The state_owner4 data type is the base type for the open_owner4 The state_owner4 data type is the base type for the open_owner4
Section 3.2.12.1 and lock_owner4 Section 3.2.12.2. NFS4_OPAQUE_LIMIT Section 3.3.10.1 and lock_owner4 Section 3.3.10.2.
is defined as 1024.
3.2.12.1. open_owner4 3.3.10.1. open_owner4
This structure is used to identify the owner of open state. This structure is used to identify the owner of open state.
3.2.12.2. lock_owner4 3.3.10.2. lock_owner4
This structure is used to identify the owner of file locking state. This structure is used to identify the owner of file locking state.
3.2.13. open_to_lock_owner4 3.3.11. open_to_lock_owner4
struct open_to_lock_owner4 { struct open_to_lock_owner4 {
seqid4 open_seqid; seqid4 open_seqid;
stateid4 open_stateid; stateid4 open_stateid;
seqid4 lock_seqid; seqid4 lock_seqid;
lock_owner4 lock_owner; lock_owner4 lock_owner;
}; };
This structure is used for the first LOCK operation done for an This structure is used for the first LOCK operation done for an
open_owner4. It provides both the open_stateid and lock_owner such open_owner4. It provides both the open_stateid and lock_owner such
that the transition is made from a valid open_stateid sequence to that the transition is made from a valid open_stateid sequence to
that of the new lock_stateid sequence. Using this mechanism avoids that of the new lock_stateid sequence. Using this mechanism avoids
the confirmation of the lock_owner/lock_seqid pair since it is tied the confirmation of the lock_owner/lock_seqid pair since it is tied
to established state in the form of the open_stateid/open_seqid. to established state in the form of the open_stateid/open_seqid.
3.2.14. stateid4 3.3.12. stateid4
struct stateid4 { struct stateid4 {
uint32_t seqid; uint32_t seqid;
opaque other[12]; opaque other[12];
}; };
This structure is used for the various state sharing mechanisms This structure is used for the various state sharing mechanisms
between the client and server. For the client, this data structure between the client and server. For the client, this data structure
is read-only. The starting value of the seqid field is undefined. is read-only. The starting value of the seqid field is undefined.
The server is required to increment the seqid field monotonically at The server is required to increment the seqid field monotonically at
each transition of the stateid. This is important since the client each transition of the stateid. This is important since the client
will inspect the seqid in OPEN stateids to determine the order of will inspect the seqid in OPEN stateids to determine the order of
OPEN processing done by the server. OPEN processing done by the server.
3.2.15. layouttype4 3.3.13. layouttype4
enum layouttype4 { enum layouttype4 {
LAYOUT4_NFSV4_1_FILES = 1, LAYOUT4_NFSV4_1_FILES = 1,
LAYOUT4_OSD2_OBJECTS = 2, LAYOUT4_OSD2_OBJECTS = 2,
LAYOUT4_BLOCK_VOLUME = 3 LAYOUT4_BLOCK_VOLUME = 3
}; };
A layout type specifies the layout being used. The implication is This data type indicates what type of layout is being used. The file
that clients have "layout drivers" that support one or more layout server advertises the layout types it supports through the
types. The file server advertises the layout types it supports fs_layout_type file system attribute (Section 5.11.1). A client asks
through the fs_layout_type file system attribute (Section 5.11.1). A for layouts of a particular type in LAYOUTGET, and processes those
client asks for layouts of a particular type in LAYOUTGET, and passes layouts in its layout-type-specific logic.
those layouts to its layout driver.
The layouttype4 structure is 32 bits in length. The range The layouttype4 structure is 32 bits in length. The range
represented by the layout type is split into three parts. Type 0x0 represented by the layout type is split into three parts. Type 0x0
is reserved. Types within the range 0x00000001-0x7FFFFFFF are is reserved. Types within the range 0x00000001-0x7FFFFFFF are
globally unique and are assigned according to the description in globally unique and are assigned according to the description in
Section 22.4; they are maintained by IANA. Types within the range Section 22.4; they are maintained by IANA. Types within the range
0x80000000-0xFFFFFFFF are site specific and for "private use" only. 0x80000000-0xFFFFFFFF are site specific and for "private use" only.
The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file
layout type is to be used. The LAYOUT4_OSD2_OBJECTS enumeration layout type is to be used. The LAYOUT4_OSD2_OBJECTS enumeration
specifies that the object layout, as defined in [29], is to be used. specifies that the object layout, as defined in [29], is to be used.
Similarly, the LAYOUT4_BLOCK_VOLUME enumeration that the block/volume Similarly, the LAYOUT4_BLOCK_VOLUME enumeration that the block/volume
layout, as defined in [30], is to be used. layout, as defined in [30], is to be used.
3.2.16. deviceid4 3.3.14. deviceid4
struct deviceid4 { const NFS4_DEVICEID4_SIZE = 16;
uint64_t did_major;
uint64_t did_minor; typedef opaque deviceid4[NFS4_DEVICEID4_SIZE];
};
Layout information includes device IDs that specify a storage device Layout information includes device IDs that specify a storage device
through a compact handle. Addressing and type information is through a compact handle. Addressing and type information is
obtained with the GETDEVICEINFO operation. A client must not assume obtained with the GETDEVICEINFO operation. A client must not assume
that device IDs are valid across metadata server reboots. The device that device IDs are valid across metadata server reboots. The device
ID is qualified by the layout type and are unique per file system ID is qualified by the layout type and are unique per file system
(FSID). See Section 12.2.10 for more details. (FSID). See Section 12.2.10 for more details.
3.2.17. device_addr4 3.3.15. device_addr4
struct device_addr4 { struct device_addr4 {
layouttype4 da_layout_type; layouttype4 da_layout_type;
opaque da_addr_body<>; opaque da_addr_body<>;
}; };
The device address is used to set up a communication channel with the The device address is used to set up a communication channel with the
storage device. Different layout types will require different types storage device. Different layout types will require different types
of structures to define how they communicate with storage devices. of structures to define how they communicate with storage devices.
The opaque da_addr_body field must be interpreted based on the The opaque da_addr_body field must be interpreted based on the
specified da_layout_type field. specified da_layout_type field.
This document defines the device address for the NFSv4.1 file layout This document defines the device address for the NFSv4.1 file layout
(see Section 13.3), which identifies a storage device by network IP (see Section 13.3), which identifies a storage device by network IP
address and port number. This is sufficient for the clients to address and port number. This is sufficient for the clients to
communicate with the NFSv4.1 storage devices, and may be sufficient communicate with the NFSv4.1 storage devices, and may be sufficient
for other layout types as well. Device types for object storage for other layout types as well. Device types for object storage
devices and block storage devices (e.g., SCSI volume labels) will be devices and block storage devices (e.g., SCSI volume labels) will be
defined by their respective layout specifications. defined by their respective layout specifications.
3.2.18. devlist_item4 3.3.16. devlist_item4
struct devlist_item4 { struct devlist_item4 {
deviceid4 dli_id; deviceid4 dli_id;
stateid4 dli_stateid;
device_addr4 dli_device_addr; device_addr4 dli_device_addr;
}; };
An array of these values is returned by the GETDEVICELIST operation. An array of these values is returned by the GETDEVICELIST operation.
They define the set of devices associated with a file system for the They define the set of devices associated with a file system for the
layout type specified in the GETDEVICELIST4args. layout type specified in the GETDEVICELIST4args.
3.2.19. layout_content4 3.3.17. layout_content4
struct layout_content4 { struct layout_content4 {
layouttype4 loc_type; layouttype4 loc_type;
opaque loc_body<>; opaque loc_body<>;
}; };
The loc_body field must be interpreted based on the layout type The loc_body field must be interpreted based on the layout type
(loc_type). This document defines the loc_body for the NFSv4.1 file (loc_type). This document defines the loc_body for the NFSv4.1 file
layout type is defined; see Section 13.3 for its definition. layout type is defined; see Section 13.3 for its definition.
3.2.20. layout4 3.3.18. layout4
struct layout4 { struct layout4 {
offset4 lo_offset; offset4 lo_offset;
length4 lo_length; length4 lo_length;
layoutiomode4 lo_iomode; layoutiomode4 lo_iomode;
layout_content4 lo_content; layout_content4 lo_content;
}; };
The layout4 structure defines a layout for a file. The layout type The layout4 structure defines a layout for a file. The layout type
specific data is opaque within lo_content. Since layouts are sub- specific data is opaque within lo_content. Since layouts are sub-
dividable, the offset and length together with the file's filehandle, dividable, the offset and length together with the file's filehandle,
the client ID, iomode, and layout type, identifies the layout. the client ID, iomode, and layout type, identify the layout.
3.2.21. layoutupdate4 3.3.19. layoutupdate4
struct layoutupdate4 { struct layoutupdate4 {
layouttype4 lou_type; layouttype4 lou_type;
opaque lou_body<>; opaque lou_body<>;
}; };
The layoutupdate4 structure is used by the client to return 'updated' The layoutupdate4 structure is used by the client to return 'updated'
layout information to the metadata server at LAYOUTCOMMIT time. This layout information to the metadata server at LAYOUTCOMMIT time. This
structure provides a channel to pass layout type specific information structure provides a channel to pass layout type specific information
(in field lou_body) back to the metadata server. E.g., for block/ (in field lou_body) back to the metadata server. E.g., for block/
volume layout types this could include the list of reserved blocks volume layout types this could include the list of reserved blocks
that were written. The contents of the opaque lou_body argument are that were written. The contents of the opaque lou_body argument are
determined by the layout type and are defined in their context. The determined by the layout type and are defined in their context. The
NFSv4.1 file-based layout does not use this structure, thus the NFSv4.1 file-based layout does not use this structure, thus the
lou_body field should have a zero length. lou_body field should have a zero length.
3.2.22. layouthint4 3.3.20. layouthint4
struct layouthint4 { struct layouthint4 {
layouttype4 loh_type; layouttype4 loh_type;
opaque loh_body<>; opaque loh_body<>;
}; };
The layouthint4 structure is used by the client to pass in a hint The layouthint4 structure is used by the client to pass in a hint
about the type of layout it would like created for a particular file. about the type of layout it would like created for a particular file.
It is the structure specified by the layout_hint attribute described It is the structure specified by the layout_hint attribute described
in Section 5.11.4. The metadata server may ignore the hint, or may in Section 5.11.4. The metadata server may ignore the hint, or may
selectively ignore fields within the hint. This hint should be selectively ignore fields within the hint. This hint should be
provided at create time as part of the initial attributes within provided at create time as part of the initial attributes within
OPEN. The loh_body field is specific to the type of layout OPEN. The loh_body field is specific to the type of layout
(loh_type). The NFSv4.1 file-based layout uses the (loh_type). The NFSv4.1 file-based layout uses the
nfsv4_1_file_layouthint4 structure as defined in Section 13.3. nfsv4_1_file_layouthint4 structure as defined in Section 13.3.
skipping to change at page 84, line 14 skipping to change at page 85, line 30
The layouthint4 structure is used by the client to pass in a hint The layouthint4 structure is used by the client to pass in a hint
about the type of layout it would like created for a particular file. about the type of layout it would like created for a particular file.
It is the structure specified by the layout_hint attribute described It is the structure specified by the layout_hint attribute described
in Section 5.11.4. The metadata server may ignore the hint, or may in Section 5.11.4. The metadata server may ignore the hint, or may
selectively ignore fields within the hint. This hint should be selectively ignore fields within the hint. This hint should be
provided at create time as part of the initial attributes within provided at create time as part of the initial attributes within
OPEN. The loh_body field is specific to the type of layout OPEN. The loh_body field is specific to the type of layout
(loh_type). The NFSv4.1 file-based layout uses the (loh_type). The NFSv4.1 file-based layout uses the
nfsv4_1_file_layouthint4 structure as defined in Section 13.3. nfsv4_1_file_layouthint4 structure as defined in Section 13.3.
3.2.23. layoutiomode4 3.3.21. layoutiomode4
enum layoutiomode4 { enum layoutiomode4 {
LAYOUTIOMODE4_READ = 1, LAYOUTIOMODE4_READ = 1,
LAYOUTIOMODE4_RW = 2, LAYOUTIOMODE4_RW = 2,
LAYOUTIOMODE4_ANY = 3 LAYOUTIOMODE4_ANY = 3
}; };
The iomode specifies whether the client intends to read or write The iomode specifies whether the client intends to read or write
(with the possibility of reading) the data represented by the layout. (with the possibility of reading) the data represented by the layout.
The ANY iomode MUST NOT be used for LAYOUTGET, however, it can be The ANY iomode MUST NOT be used for LAYOUTGET, however, it can be
used for LAYOUTRETURN and LAYOUTRECALL. The ANY iomode specifies used for LAYOUTRETURN and CB_LAYOUTRECALL. The ANY iomode specifies
that layouts pertaining to both READ and RW iomodes are being that layouts pertaining to both READ and RW iomodes are being
returned or recalled, respectively. The metadata server's use of the returned or recalled, respectively. The metadata server's use of the
iomode may depend on the layout type being used. The storage devices iomode may depend on the layout type being used. The storage devices
may validate I/O accesses against the iomode and reject invalid may validate I/O accesses against the iomode and reject invalid
accesses. accesses.
3.2.24. nfs_impl_id4 3.3.22. nfs_impl_id4
struct nfs_impl_id4 { struct nfs_impl_id4 {
utf8str_cis nii_domain; utf8str_cis nii_domain;
utf8str_cs nii_name; utf8str_cs nii_name;
nfstime4 nii_date; nfstime4 nii_date;
}; };
This structure is used to identify client and server implementation This structure is used to identify client and server implementation
detail. The nii_domain field is the DNS domain name that the detail. The nii_domain field is the DNS domain name that the
implementer is associated with. The nii_name field is the product implementer is associated with. The nii_name field is the product
name of the implementation and is completely free form. It is name of the implementation and is completely free form. It is
recommended that the nii_name be used to distinguish machine recommended that the nii_name be used to distinguish machine
architecture, machine platforms, revisions, versions, and patch architecture, machine platforms, revisions, versions, and patch
levels. The nii_date field is the timestamp of when the software levels. The nii_date field is the timestamp of when the software
instance was published or built. instance was published or built.
3.2.25. threshold_item4 3.3.23. threshold_item4
struct threshold_item4 { struct threshold_item4 {
layouttype4 thi_layout_type; layouttype4 thi_layout_type;
bitmap4 thi_hintset; bitmap4 thi_hintset;
opaque thi_hintlist<>; opaque thi_hintlist<>;
}; };
This structure contains a list of hints specific to a layout type for This structure contains a list of hints specific to a layout type for
helping the client determine when it should issue I/O directly helping the client determine when it should send I/O directly through
through the metadata server vs. the data servers. The hint structure the metadata server vs. the data servers. The hint structure
consists of the layout type (thi_layout_type), a bitmap (thi_hintset) consists of the layout type (thi_layout_type), a bitmap (thi_hintset)
describing the set of hints supported by the server (they may differ describing the set of hints supported by the server (they may differ
based on the layout type), and a list of hints (thi_hintlist), whose based on the layout type), and a list of hints (thi_hintlist), whose
structure is determined by the hintset bitmap. See the mdsthreshold structure is determined by the hintset bitmap. See the mdsthreshold
attribute for more details. attribute for more details.
The thi_hintset field is a bitmap of the following values: The thi_hintset field is a bitmap of the following values:
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
| name | # | Data | Description | | name | # | Data | Description |
skipping to change at page 85, line 45 skipping to change at page 87, line 14
| threshold4_read_iosize | 2 | length4 | For read I/O sizes below | | threshold4_read_iosize | 2 | length4 | For read I/O sizes below |
| | | | this threshold it is | | | | | this threshold it is |
| | | | recommended to read data | | | | | recommended to read data |
| | | | through the MDS | | | | | through the MDS |
| threshold4_write_iosize | 3 | length4 | For write I/O sizes below | | threshold4_write_iosize | 3 | length4 | For write I/O sizes below |
| | | | this threshold it is | | | | | this threshold it is |
| | | | recommended to write data | | | | | recommended to write data |
| | | | through the MDS | | | | | through the MDS |
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
3.2.26. mdsthreshold4 3.3.24. mdsthreshold4
struct mdsthreshold4 { struct mdsthreshold4 {
threshold_item4 mth_hints<>; threshold_item4 mth_hints<>;
}; };
This structure holds an array of threshold_item4 structures each of This structure holds an array of threshold_item4 structures each of
which is valid for a particular layout type. An array is necessary which is valid for a particular layout type. An array is necessary
since a server can support multiple layout types for a single file. since a server can support multiple layout types for a single file.
4. Filehandles 4. Filehandles
skipping to change at page 86, line 18 skipping to change at page 87, line 36
The filehandle in the NFS protocol is a per server unique identifier The filehandle in the NFS protocol is a per server unique identifier
for a file system object. The contents of the filehandle are opaque for a file system object. The contents of the filehandle are opaque
to the client. Therefore, the server is responsible for translating to the client. Therefore, the server is responsible for translating
the filehandle to an internal representation of the file system the filehandle to an internal representation of the file system
object. object.
4.1. Obtaining the First Filehandle 4.1. Obtaining the First Filehandle
The operations of the NFS protocol are defined in terms of one or The operations of the NFS protocol are defined in terms of one or
more filehandles. Therefore, the client needs a filehandle to more filehandles. Therefore, the client needs a filehandle to
initiate communication with the server. With the NFS version 2 initiate communication with the server. With the NFSv3 protocol
protocol RFC1094 [21] and the NFS version 3 protocol RFC1813 [22], RFC1813 [21], there exists an ancillary protocol to obtain this first
there exists an ancillary protocol to obtain this first filehandle. filehandle. The MOUNT protocol, RPC program number 100005, provides
The MOUNT protocol, RPC program number 100005, provides the mechanism the mechanism of translating a string based file system path name to
of translating a string based file system path name to a filehandle a filehandle which can then be used by the NFS protocols.
which can then be used by the NFS protocols.
The MOUNT protocol has deficiencies in the area of security and use The MOUNT protocol has deficiencies in the area of security and use
via firewalls. This is one reason that the use of the public via firewalls. This is one reason that the use of the public
filehandle was introduced in RFC2054 [31] and RFC2055 [32]. With the filehandle was introduced in RFC2054 [31] and RFC2055 [32]. With the
use of the public filehandle in combination with the LOOKUP operation use of the public filehandle in combination with the LOOKUP operation
in the NFS version 2 and 3 protocols, it has been demonstrated that in the NFSv3 protocol, it has been demonstrated that the MOUNT
the MOUNT protocol is unnecessary for viable interaction between NFS protocol is unnecessary for viable interaction between NFS client and
client and server. server.
Therefore, the NFS version 4 protocol will not use an ancillary Therefore, the NFSv4.1 protocol will not use an ancillary protocol
protocol for translation from string based path names to a for translation from string based path names to a filehandle. Two
filehandle. Two special filehandles will be used as starting points special filehandles will be used as starting points for the NFS
for the NFS client. client.
4.1.1. Root Filehandle 4.1.1. Root Filehandle
The first of the special filehandles is the ROOT filehandle. The The first of the special filehandles is the ROOT filehandle. The
ROOT filehandle is the "conceptual" root of the file system name ROOT filehandle is the "conceptual" root of the file system name
space at the NFS server. The client uses or starts with the ROOT space at the NFS server. The client uses or starts with the ROOT
filehandle by employing the PUTROOTFH operation. The PUTROOTFH filehandle by employing the PUTROOTFH operation. The PUTROOTFH
operation instructs the server to set the "current" filehandle to the operation instructs the server to set the "current" filehandle to the
ROOT of the server's file tree. Once this PUTROOTFH operation is ROOT of the server's file tree. Once this PUTROOTFH operation is
used, the client can then traverse the entirety of the server's file used, the client can then traverse the entirety of the server's file
tree with the LOOKUP operation. A complete discussion of the server tree with the LOOKUP operation. A complete discussion of the server
name space is in the section "NFS Server Name Space". name space is in the Section 7.
4.1.2. Public Filehandle 4.1.2. Public Filehandle
The second special filehandle is the PUBLIC filehandle. Unlike the The second special filehandle is the PUBLIC filehandle. Unlike the
ROOT filehandle, the PUBLIC filehandle may be bound or represent an ROOT filehandle, the PUBLIC filehandle may be bound or represent an
arbitrary file system object at the server. The server is arbitrary file system object at the server. The server is
responsible for this binding. It may be that the PUBLIC filehandle responsible for this binding. It may be that the PUBLIC filehandle
and the ROOT filehandle refer to the same file system object. and the ROOT filehandle refer to the same file system object.
However, it is up to the administrative software at the server and However, it is up to the administrative software at the server and
the policies of the server administrator to define the binding of the the policies of the server administrator to define the binding of the
PUBLIC filehandle and server file system object. The client may not PUBLIC filehandle and server file system object. The client may not
make any assumptions about this binding. The client uses the PUBLIC make any assumptions about this binding. The client uses the PUBLIC
filehandle via the PUTPUBFH operation. filehandle via the PUTPUBFH operation.
4.2. Filehandle Types 4.2. Filehandle Types
In the NFS version 2 and 3 protocols, there was one type of In the NFSv3 protocol, there was one type of filehandle with a single
filehandle with a single set of semantics. This type of filehandle set of semantics. This type of filehandle is termed "persistent" in
is termed "persistent" in NFS Version 4. The semantics of a NFSv4.1. The semantics of a persistent filehandle remain the same as
persistent filehandle remain the same as before. A new type of before. A new type of filehandle introduced in NFSv4.1 is the
filehandle introduced in NFS Version 4 is the "volatile" filehandle, "volatile" filehandle, which attempts to accommodate certain server
which attempts to accommodate certain server environments. environments.
The volatile filehandle type was introduced to address server The volatile filehandle type was introduced to address server
functionality or implementation issues which make correct functionality or implementation issues which make correct
implementation of a persistent filehandle infeasible. Some server implementation of a persistent filehandle infeasible. Some server
environments do not provide a file system level invariant that can be environments do not provide a file system level invariant that can be
used to construct a persistent filehandle. The underlying server used to construct a persistent filehandle. The underlying server
file system may not provide the invariant or the server's file system file system may not provide the invariant or the server's file system
programming interfaces may not provide access to the needed programming interfaces may not provide access to the needed
invariant. Volatile filehandles may ease the implementation of invariant. Volatile filehandles may ease the implementation of
server functionality such as hierarchical storage management or file server functionality such as hierarchical storage management or file
skipping to change at page 87, line 50 skipping to change at page 89, line 17
filehandles differently, a file attribute is defined which may be filehandles differently, a file attribute is defined which may be
used by the client to determine the filehandle types being returned used by the client to determine the filehandle types being returned
by the server. by the server.
4.2.1. General Properties of a Filehandle 4.2.1. General Properties of a Filehandle
The filehandle contains all the information the server needs to The filehandle contains all the information the server needs to
distinguish an individual file. To the client, the filehandle is distinguish an individual file. To the client, the filehandle is
opaque. The client stores filehandles for use in a later request and opaque. The client stores filehandles for use in a later request and
can compare two filehandles from the same server for equality by can compare two filehandles from the same server for equality by
doing an byte-by-byte comparison. However, the client MUST NOT doing a byte-by-byte comparison. However, the client MUST NOT
otherwise interpret the contents of filehandles. If two filehandles otherwise interpret the contents of filehandles. If two filehandles
from the same server are equal, they MUST refer to the same file. from the same server are equal, they MUST refer to the same file.
Servers SHOULD try to maintain a one-to-one correspondence between Servers SHOULD try to maintain a one-to-one correspondence between
filehandles and files but this is not required. Clients MUST use filehandles and files but this is not required. Clients MUST use
filehandle comparisons only to improve performance, not for correct filehandle comparisons only to improve performance, not for correct
behavior. All clients need to be prepared for situations in which it behavior. All clients need to be prepared for situations in which it
cannot be determined whether two filehandles denote the same object cannot be determined whether two filehandles denote the same object
and in such cases, avoid making invalid assumptions which might cause and in such cases, avoid making invalid assumptions which might cause
incorrect behavior. Further discussion of filehandle and attribute incorrect behavior. Further discussion of filehandle and attribute
comparison in the context of data caching is presented in the section comparison in the context of data caching is presented in the
"Data Caching and File Identity". Section 10.3.4.
As an example, in the case that two different path names when As an example, in the case that two different path names when
traversed at the server terminate at the same file system object, the traversed at the server terminate at the same file system object, the
server SHOULD return the same filehandle for each path. This can server SHOULD return the same filehandle for each path. This can
occur if a hard link is used to create two file names which refer to occur if a hard link is used to create two file names which refer to
the same underlying file object and associated data. For example, if the same underlying file object and associated data. For example, if
paths /a/b/c and /a/d/c refer to the same file, the server SHOULD paths /a/b/c and /a/d/c refer to the same file, the server SHOULD
return the same filehandle for both path names traversals. return the same filehandle for both path names traversals.
4.2.2. Persistent Filehandle 4.2.2. Persistent Filehandle
skipping to change at page 91, line 32 skipping to change at page 92, line 51
GETFH GETFH
Note that the COMPOUND procedure does not provide atomicity. This Note that the COMPOUND procedure does not provide atomicity. This
example only reduces the overhead of recovering from an expired example only reduces the overhead of recovering from an expired
filehandle. filehandle.
5. File Attributes 5. File Attributes
To meet the requirements of extensibility and increased To meet the requirements of extensibility and increased
interoperability with non-UNIX platforms, attributes must be handled interoperability with non-UNIX platforms, attributes must be handled
in a flexible manner. The NFS version 3 fattr3 structure contains a in a flexible manner. The NFSv3 fattr3 structure contains a fixed
fixed list of attributes that not all clients and servers are able to list of attributes that not all clients and servers are able to
support or care about. The fattr3 structure can not be extended as support or care about. The fattr3 structure can not be extended as
new needs arise and it provides no way to indicate non-support. With new needs arise and it provides no way to indicate non-support. With
the NFS version 4 protocol, the client is able query what attributes the NFSv4.1 protocol, the client is able query what attributes the
the server supports and construct requests with only those supported server supports and construct requests with only those supported
attributes (or a subset thereof). attributes (or a subset thereof).
To this end, attributes are divided into three groups: mandatory, To this end, attributes are divided into three groups: mandatory,
recommended, and named. Both mandatory and recommended attributes recommended, and named. Both mandatory and recommended attributes
are supported in the NFS version 4 protocol by a specific and well- are supported in the NFSv4.1 protocol by a specific and well-defined
defined encoding and are identified by number. They are requested by encoding and are identified by number. They are requested by setting
setting a bit in the bit vector sent in the GETATTR request; the a bit in the bit vector sent in the GETATTR request; the server
server response includes a bit vector to list what attributes were response includes a bit vector to list what attributes were returned
returned in the response. New mandatory or recommended attributes in the response. New mandatory or recommended attributes may be
may be added to the NFS protocol between major revisions by added to the NFS protocol between major revisions by publishing a
publishing a standards-track RFC which allocates a new attribute standards-track RFC which allocates a new attribute number value and
number value and defines the encoding for the attribute. See the defines the encoding for the attribute. See Section 2.7 for further
section Minor Versioning (Section 2.7) for further discussion. discussion.
Named attributes are accessed by the new OPENATTR operation, which Named attributes are accessed by the new OPENATTR operation, which
accesses a hidden directory of attributes associated with a file accesses a hidden directory of attributes associated with a file
system object. OPENATTR takes a filehandle for the object and system object. OPENATTR takes a filehandle for the object and
returns the filehandle for the attribute hierarchy. The filehandle returns the filehandle for the attribute hierarchy. The filehandle
for the named attributes is a directory object accessible by LOOKUP for the named attributes is a directory object accessible by LOOKUP
or READDIR and contains files whose names represent the named or READDIR and contains files whose names represent the named
attributes and whose data bytes are the value of the attribute. For attributes and whose data bytes are the value of the attribute. For
example: example:
skipping to change at page 92, line 45 skipping to change at page 94, line 15
Note that the hidden directory returned by OPENATTR is a convenience Note that the hidden directory returned by OPENATTR is a convenience
for protocol processing. The client should not make any assumptions for protocol processing. The client should not make any assumptions
about the server's implementation of named attributes and whether the about the server's implementation of named attributes and whether the
underlying file system at the server has a named attribute directory underlying file system at the server has a named attribute directory
or not. Therefore, operations such as SETATTR and GETATTR on the or not. Therefore, operations such as SETATTR and GETATTR on the
named attribute directory are undefined. named attribute directory are undefined.
5.1. Mandatory Attributes 5.1. Mandatory Attributes
These MUST be supported by every NFS version 4 client and server in These MUST be supported by every NFSv4.1 client and server in order
order to ensure a minimum level of interoperability. The server must to ensure a minimum level of interoperability. The server must store
store and return these attributes and the client must be able to and return these attributes and the client must be able to function
function with an attribute set limited to these attributes. With with an attribute set limited to these attributes. With just the
just the mandatory attributes some client functionality may be mandatory attributes some client functionality may be impaired or
impaired or limited in some ways. A client may ask for any of these limited in some ways. A client may ask for any of these attributes
attributes to be returned by setting a bit in the GETATTR request and to be returned by setting a bit in the GETATTR request and the server
the server must return their value. must return their value.
5.2. Recommended Attributes 5.2. Recommended Attributes
These attributes are understood well enough to warrant support in the These attributes are understood well enough to warrant support in the
NFS version 4 protocol. However, they may not be supported on all NFSv4.1 protocol. However, they may not be supported on all clients
clients and servers. A client may ask for any of these attributes to and servers. A client may ask for any of these attributes to be
be returned by setting a bit in the GETATTR request but must handle returned by setting a bit in the GETATTR request but must handle the
the case where the server does not return them. A client may ask for case where the server does not return them. A client may ask for the
the set of attributes the server supports and should not request set of attributes the server supports and should not request
attributes the server does not support. A server should be tolerant attributes the server does not support. A server should be tolerant
of requests for unsupported attributes and simply not return them of requests for unsupported attributes and simply not return them
rather than considering the request an error. It is expected that rather than considering the request an error. It is expected that
servers will support all attributes they comfortably can and only servers will support all attributes they comfortably can and only
fail to support attributes which are difficult to support in their fail to support attributes which are difficult to support in their
operating environments. A server should provide attributes whenever operating environments. A server should provide attributes whenever
they don't have to "tell lies" to the client. For example, a file they don't have to "tell lies" to the client. For example, a file
modification time should be either an accurate time or should not be modification time should be either an accurate time or should not be
supported by the server. This will not always be comfortable to supported by the server. This will not always be comfortable to
clients but the client is better positioned decide whether and how to clients but the client is better positioned decide whether and how to
fabricate or construct an attribute or whether to do without the fabricate or construct an attribute or whether to do without the
attribute. attribute.
5.3. Named Attributes 5.3. Named Attributes
These attributes are not supported by direct encoding in the NFS These attributes are not supported by direct encoding in the NFSv4
Version 4 protocol but are accessed by string names rather than protocol but are accessed by string names rather than numbers and
numbers and correspond to an uninterpreted stream of bytes which are correspond to an uninterpreted stream of bytes which are stored with
stored with the file system object. The name space for these the file system object. The name space for these attributes may be
attributes may be accessed by using the OPENATTR operation. The accessed by using the OPENATTR operation. The OPENATTR operation
OPENATTR operation returns a filehandle for a virtual "attribute returns a filehandle for a virtual "named attribute directory" and
directory" and further perusal of the name space may be done using further perusal and modification of the name space may be done using
READDIR and LOOKUP operations on this filehandle. Named attributes operations that work on more typical directories. In particular,
may then be examined or changed by normal READ and WRITE and CREATE READDIR may be used to get a list of such named attributes and LOOKUP
operations on the filehandles returned from READDIR and LOOKUP. and OPEN may select a particular attribute. Creation of a new named
Named attributes may have attributes. attribute may be the result of an OPEN specifying file creation.
Once an OPEN is done, named attributes may be examined and changed by
normal READ and WRITE operations using the filehandles and stateids
returned by OPEN.
Named attributes and the named attribute directory may have have
their own (non-named) attributes. Each of objects must have all of
the mandatory attributes and may have additional recommended
attributes. However, the set of attributes for named attributes and
the named attribute directory need not be as large as, and typically
will not be as large as that for other objects in that file system.
Named attributes and the named attribute directory may be the target
of delegations (in the case of the named attribute directory these
will be directory delegations). However, since granting of
delegations or not is within the server's discretion, a server need
not support delegations on named attributes or the named attribute
directory.
It is recommended that servers support arbitrary named attributes. A It is recommended that servers support arbitrary named attributes. A
client should not depend on the ability to store any named attributes client should not depend on the ability to store any named attributes
in the server's file system. If a server does support named in the server's file system. If a server does support named
attributes, a client which is also able to handle them should be able attributes, a client which is also able to handle them should be able
to copy a file's data and meta-data with complete transparency from to copy a file's data and meta-data with complete transparency from
one location to another; this would imply that names allowed for one location to another; this would imply that names allowed for
regular directory entries are valid for named attribute names as regular directory entries are valid for named attribute names as
well. well.
In NFSv4.1, the structure of named attribute directories is
restricted in a number of ways, in order to prevent the development
of non-interoperable implementations in which some servers support a
fully general hierarchical directory structure for named attributes
while others support a limited set, but fully adequate to the
feature's goals. In such an environment, clients or applications
might come to depend on non-portable extensions. The restrictions
are:
o CREATE is not allowed in a named attribute directory. Thus, such
objects as symbolic links and special files are not allowed to be
named attributes. Further, directories may not be created in a
named attribute directory so no hierarchical structure of named
attributes for a single object is allowed.
o OPENATTR many not be done on a named attribute directory or on a
named attribute. Thus, although these object have attributes,
they may not may named attributes.
o Doing a RENAME of a named attribute to a different named attribute
directory or to an ordinary (i.e. non-named-attribute) directory
is not allowed.
o Creating hard links between names attribute directories or between
named attribute directories and ordinary directories is not
allowed.
Names of attributes will not be controlled by this document or other Names of attributes will not be controlled by this document or other
IETF standards track documents. See the section IANA Considerations IETF standards track documents. See Section 22.1 for further
(Section 22.1) for further discussion. discussion.
5.4. Classification of Attributes 5.4. Classification of Attributes
Each of the Mandatory and Recommended attributes can be classified in Each of the Mandatory and Recommended attributes can be classified in
one of three categories: per server, per file system, or per file one of three categories: per server, per file system, or per file
system object. Note that it is possible that some per file system system object. Note that it is possible that some per file system
attributes may vary within the file system. See the "homogeneous" attributes may vary within the file system. See the "homogeneous"
attribute for its definition. Note that the attributes attribute for its definition. Note that the attributes
time_access_set and time_modify_set are not listed in this section time_access_set and time_modify_set are not listed in this section
because they are write-only attributes corresponding to time_access because they are write-only attributes corresponding to time_access
and time_modify, and are used in a special instance of SETATTR. and time_modify, and are used in a special instance of SETATTR.
o The per server attribute is: o The per server attribute is:
lease_time lease_time
o The per file system attributes are: o The per file system attributes are:
supp_attr, fh_expire_type, link_support, symlink_support, supported_attrs, suppattr_exclcreat, fh_expire_type,
unique_handles, aclsupport, cansettime, case_insensitive, link_support, symlink_support, unique_handles, aclsupport,
case_preserving, chown_restricted, files_avail, files_free, cansettime, case_insensitive, case_preserving,
files_total, fs_locations, homogeneous, maxfilesize, maxname, chown_restricted, files_avail, files_free, files_total,
maxread, maxwrite, no_trunc, space_avail, space_free, fs_locations, homogeneous, maxfilesize, maxname, maxread,
space_total, time_delta, change_policy, fs_status, maxwrite, no_trunc, space_avail, space_free, space_total,
fs_layout_type, fs_locations_info time_delta, change_policy, fs_status, fs_layout_type,
fs_locations_info, fs_charset_cap
o The per file system object attributes are: o The per file system object attributes are:
type, change, size, named_attr, fsid, rdattr_error, filehandle, type, change, size, named_attr, fsid, rdattr_error, filehandle,
acl, archive, fileid, hidden, maxlink, mimetype, mode, acl, archive, fileid, hidden, maxlink, mimetype, mode,
numlinks, owner, owner_group, rawdev, space_used, system, numlinks, owner, owner_group, rawdev, space_used, system,
time_access, time_backup, time_create, time_metadata, time_access, time_backup, time_create, time_metadata,
time_modify, mounted_on_fileid, dir_notif_delay, time_modify, mounted_on_fileid, dir_notif_delay,
dirent_notif_delay, dacl, sacl, layout_type, layout_hint, dirent_notif_delay, dacl, sacl, layout_type, layout_hint,
layout_blksize, layout_alignment, mdsthreshold, retention_get, layout_blksize, layout_alignment, mdsthreshold, retention_get,
retention_set, retentevt_get, retentevt_set, retention_hold, retention_set, retentevt_get, retentevt_set, retention_hold,
mode_set_masked mode_set_masked
For quota_avail_hard, quota_avail_soft, and quota_used see their For quota_avail_hard, quota_avail_soft, and quota_used see their
definitions below for the appropriate classification. definitions below for the appropriate classification.
5.5. Mandatory Attributes - List and Definition References 5.5. Mandatory Attributes - List and Definition References
+-----------------+----+------------+------+----------------+ +--------------------+----+------------+------+----------------+
| name | Id | Data Type | Acc. | Defined in: | | name | Id | Data Type | Acc. | Defined in: |
+-----------------+----+------------+------+----------------+ +--------------------+----+------------+------+----------------+
| supp_attr | 0 | bitmap | RD | Section 5.7.1 | | supported_attrs | 0 | bitmap4 | RD | Section 5.7.1 |
| type | 1 | nfs_ftype4 | RD | Section 5.7.2 | | type | 1 | nfs_ftype4 | RD | Section 5.7.3 |
| fh_expire_type | 2 | uint32 | RD | Section 5.7.3 | | fh_expire_type | 2 | uint32_t | RD | Section 5.7.4 |
| change | 3 | uint64 | RD | Section 5.7.4 | | change | 3 | uint64_t | RD | Section 5.7.5 |
| size | 4 | uint64 | R/W | Section 5.7.5 | | size | 4 | uint64_t | R/W | Section 5.7.6 |
| link_support | 5 | bool | RD | Section 5.7.6 | | link_support | 5 | bool | RD | Section 5.7.7 |
| symlink_support | 6 | bool | RD | Section 5.7.7 | | symlink_support | 6 | bool | RD | Section 5.7.8 |
| named_attr | 7 | bool | RD | Section 5.7.8 | | named_attr | 7 | bool | RD | Section 5.7.9 |
| fsid | 8 | fsid4 | RD | Section 5.7.9 | | fsid | 8 | fsid4 | RD | Section 5.7.10 |
| unique_handles | 9 | bool | RD | Section 5.7.10 | | unique_handles | 9 | bool | RD | Section 5.7.11 |
| lease_time | 10 | nfs_lease4 | RD | Section 5.7.11 | | lease_time | 10 | nfs_lease4 | RD | Section 5.7.12 |
| rdattr_error | 11 | enum | RD | Section 5.7.12 | | rdattr_error | 11 | enum | RD | Section 5.7.13 |
| filehandle | 19 | nfs_fh4 | RD | Section 5.7.13 | | filehandle | 19 | nfs_fh4 | RD | Section 5.7.14 |
+-----------------+----+------------+------+----------------+ | suppattr_exclcreat | 75 | bitmap4 | RD | Section 5.7.2 |
+--------------------+----+------------+------+----------------+
5.6. Recommended Attributes - List and Definition References 5.6. Recommended Attributes - List and Definition References
+--------------------+----+----------------+------+-----------------+ +--------------------+----+----------------+------+-----------------+
| name | Id | Data Type | Acc. | Defined in: | | name | Id | Data Type | Acc. | Defined in: |
+--------------------+----+----------------+------+-----------------+ +--------------------+----+----------------+------+-----------------+
| acl | 12 | nfsace4<> | R/W | Section 6.2.1 | | acl | 12 | nfsace4<> | R/W | Section 6.2.1 |
| aclsupport | 13 | uint32 | RD | Section 6.2.1.2 | | aclsupport | 13 | uint32_t | RD | Section 6.2.1.2 |
| archive | 14 | bool | R/W | Section 5.7.14 | | archive | 14 | bool | R/W | Section 5.7.15 |
| cansettime | 15 | bool | RD | Section 5.7.15 | | cansettime | 15 | bool | RD | Section 5.7.16 |
| case_insensitive | 16 | bool | RD | Section 5.7.16 | | case_insensitive | 16 | bool | RD | Section 5.7.17 |
| change_policy | 60 | chg_policy4 | RD | Section 5.7.17 | | case_preserving | 17 | bool | RD | Section 5.7.19 |
| case_preserving | 17 | bool | RD | Section 5.7.18 | | change_policy | 60 | chg_policy4 | RD | Section 5.7.18 |
| chown_restricted | 18 | bool | RD | Section 5.7.19 | | chown_restricted | 18 | bool | RD | Section 5.7.20 |
| dacl | 58 | nfsacl41 | R/W | Section 6.2.2 | | dacl | 58 | nfsacl41 | R/W | Section 6.2.2 |
| dir_notif_delay | 56 | nfstime4 | RD | Section 5.10.1 | | dir_notif_delay | 56 | nfstime4 | RD | Section 5.10.1 |
| dirent_notif_delay | 57 | nfstime4 | RD | Section 5.10.2 | | dirent_notif_delay | 57 | nfstime4 | RD | Section 5.10.2 |
| fileid | 20 | uint64 | RD | Section 5.7.20 | | fileid | 20 | uint64_t | RD | Section 5.7.21 |
| files_avail | 21 | uint64 | RD | Section 5.7.21 | | files_avail | 21 | uint64_t | RD | Section 5.7.22 |
| files_free | 22 | uint64 | RD | Section 5.7.22 | | files_free | 22 | uint64_t | RD | Section 5.7.23 |
| files_total | 23 | uint64 | RD | Section 5.7.23 | | files_total | 23 | uint64_t | RD | Section 5.7.24 |
| fs_charset_cap | 76 | uint32_t | RD | Section 5.7.25 |
| fs_layout_type | 62 | layouttype4<> | RD | Section 5.11.1 | | fs_layout_type | 62 | layouttype4<> | RD | Section 5.11.1 |
| fs_locations | 24 | fs_locations | RD | Section 5.7.24 | | fs_locations | 24 | fs_locations | RD | Section 5.7.26 |
| fs_locations_info | 67 | | RD | Section 5.7.25 | | fs_locations_info | 67 | | RD | Section 5.7.27 |
| fs_status | 61 | fs4_status | RD | Section 5.7.26 | | fs_status | 61 | fs4_status | RD | Section 5.7.28 |
| hidden | 25 | bool | R/W | Section 5.7.27 | | hidden | 25 | bool | R/W | Section 5.7.29 |
| homogeneous | 26 | bool | RD | Section 5.7.28 | | homogeneous | 26 | bool | RD | Section 5.7.30 |
| layout_alignment | 66 | uint32 | RD | Section 5.11.2 | | layout_alignment | 66 | uint32_t | RD | Section 5.11.2 |
| layout_blksize | 65 | uint32 | RD | Section 5.11.3 | | layout_blksize | 65 | uint32_t | RD | Section 5.11.3 |
| layout_hint | 63 | layouthint4 | WRT | Section 5.11.4 | | layout_hint | 63 | layouthint4 | WRT | Section 5.11.4 |
| layout_type | 64 | layouttype4<> | RD | Section 5.11.5 | | layout_type | 64 | layouttype4<> | RD | Section 5.11.5 |
| maxfilesize | 27 | uint64 | RD | Section 5.7.29 | | maxfilesize | 27 | uint64_t | RD | Section 5.7.31 |
| maxlink | 28 | uint32 | RD | Section 5.7.30 | | maxlink | 28 | uint32_t | RD | Section 5.7.32 |
| maxname | 29 | uint32 | RD | Section 5.7.31 | | maxname | 29 | uint32_t | RD | Section 5.7.33 |
| maxread | 30 | uint64 | RD | Section 5.7.32 | | maxread | 30 | uint64_t | RD | Section 5.7.34 |
| maxwrite | 31 | uint64 | RD | Section 5.7.33 | | maxwrite | 31 | uint64_t | RD | Section 5.7.35 |
| mdsthreshold | 68 | mdsthreshold4 | RD | Section 5.11.6 | | mdsthreshold | 68 | mdsthreshold4 | RD | Section 5.11.6 |
| mimetype | 32 | utf8<> | R/W | Section 5.7.34 | | mimetype | 32 | utf8<> | R/W | Section 5.7.36 |
| mode | 33 | mode4 | R/W | Section 6.2.4 | | mode | 33 | mode4 | R/W | Section 6.2.4 |
| mode_set_masked | 74 | mode_masked4 | WRT | Section 6.2.5 | | mode_set_masked | 74 | mode_masked4 | WRT | Section 6.2.5 |
| mounted_on_fileid | 55 | uint64 | RD | Section 5.7.35 | | mounted_on_fileid | 55 | uint64_t | RD | Section 5.7.37 |
| no_trunc | 34 | bool | RD | Section 5.7.36 | | no_trunc | 34 | bool | RD | Section 5.7.38 |
| numlinks | 35 | uint32 | RD | Section 5.7.37 | | numlinks | 35 | uint32_t | RD | Section 5.7.39 |
| owner | 36 | utf8<> | R/W | Section 5.7.38 | | owner | 36 | utf8<> | R/W | Section 5.7.40 |
| owner_group | 37 | utf8<> | R/W | Section 5.7.39 | | owner_group | 37 | utf8<> | R/W | Section 5.7.41 |
| quota_avail_hard | 38 | uint64 | RD | Section 5.7.40 | | quota_avail_hard | 38 | uint64_t | RD | Section 5.7.42 |
| quota_avail_soft | 39 | uint64 | RD | Section 5.7.41 | | quota_avail_soft | 39 | uint64_t | RD | Section 5.7.43 |
| quota_used | 40 | uint64 | RD | Section 5.7.42 | | quota_used | 40 | uint64_t | RD | Section 5.7.44 |
| rawdev | 41 | specdata4 | RD | Section 5.7.43 | | rawdev | 41 | specdata4 | RD | Section 5.7.45 |
| retentevt_get | 71 | retention_get4 | RD | Section 5.12.3 | | retentevt_get | 71 | retention_get4 | RD | Section 5.12.3 |
| retentevt_set | 72 | retention_set4 | WRT | Section 5.12.4 | | retentevt_set | 72 | retention_set4 | WRT | Section 5.12.4 |
| retention_get | 69 | retention_get4 | RD | Section 5.12.1 | | retention_get | 69 | retention_get4 | RD | Section 5.12.1 |
| retention_hold | 73 | uint64 | R/W | Section 5.12.5 | | retention_hold | 73 | uint64_t | R/W | Section 5.12.5 |
| retention_set | 70 | retention_set4 | WRT | Section 5.12.2 | | retention_set | 70 | retention_set4 | WRT | Section 5.12.2 |
| sacl | 59 | nfsacl41 | R/W | Section 6.2.3 | | sacl | 59 | nfsacl41 | R/W | Section 6.2.3 |
| space_avail | 42 | uint64 | RD | Section 5.7.44 | | space_avail | 42 | uint64_t | RD | Section 5.7.46 |
| space_free | 43 | uint64 | RD | Section 5.7.45 | | space_free | 43 | uint64_t | RD | Section 5.7.47 |
| space_total | 44 | uint64 | RD | Section 5.7.46 | | space_total | 44 | uint64_t | RD | Section 5.7.48 |
| space_used | 45 | uint64 | RD | Section 5.7.47 | | space_used | 45 | uint64_t | RD | Section 5.7.49 |
| system | 46 | bool | R/W | Section 5.7.48 | | system | 46 | bool | R/W | Section 5.7.50 |
| time_access | 47 | nfstime4 | RD | Section 5.7.49 | | time_access | 47 | nfstime4 | RD | Section 5.7.51 |
| time_access_set | 48 | settime4 | WRT | Section 5.7.50 | | time_access_set | 48 | settime4 | WRT | Section 5.7.52 |
| time_backup | 49 | nfstime4 | R/W | Section 5.7.51 | | time_backup | 49 | nfstime4 | R/W | Section 5.7.53 |
| time_create | 50 | nfstime4 | R/W | Section 5.7.52 | | time_create | 50 | nfstime4 | R/W | Section 5.7.54 |
| time_delta | 51 | nfstime4 | RD | Section 5.7.53 | | time_delta | 51 | nfstime4 | RD | Section 5.7.55 |
| time_metadata | 52 | nfstime4 | RD | Section 5.7.54 | | time_metadata | 52 | nfstime4 | RD | Section 5.7.56 |
| time_modify | 53 | nfstime4 | RD | Section 5.7.55 | | time_modify | 53 | nfstime4 | RD | Section 5.7.57 |
| time_modify_set | 54 | settime4 | WRT | Section 5.7.56 | | time_modify_set | 54 | settime4 | WRT | Section 5.7.58 |
+--------------------+----+----------------+------+-----------------+ +--------------------+----+----------------+------+-----------------+
5.7. Attribute Definitions 5.7. Attribute Definitions
5.7.1. Attribute 0: supp_attr 5.7.1. Attribute 0: supported_attrs
The bit vector which would retrieve all mandatory and recommended The bit vector which would retrieve all mandatory and recommended
attributes that are supported for this object. The scope of this attributes that are supported for this object. The scope of this
attribute applies to all objects with a matching fsid. attribute applies to all objects with a matching fsid.
5.7.2. Attribute 1: type 5.7.2. Attribute 75: suppattr_exclcreat
The type of the object (file, directory, symlink, etc.) The bit vector which would set all mandatory and recommended
attributes that are supported by the EXCLUSIVE4_1 method of file
creation via the OPEN operation. The scope of this attribute applies
to all objects with a matching fsid.
5.7.3. Attribute 2: fh_expire_type 5.7.3. Attribute 1: type
Designates the type of an object in terms of one of a number of
special constants:
o NF4REG designates a regular file.
o NF4DIR designates a directory.
o NF4BLK designates a block device special file.
o NF4CHR designates a character device special file.
o NF4LNK designates a symbolic link.
o NF4SOCK designates a named socket special file.
o NF4FIFO designates a fifo special file.
o NF4ATTRDIR designates a named attribute directory.
o NF4NAMEDATTR designates a named attribute.
Within the explanatory text and operation descriptions, the following
phrases will be used with the meanings given below:
o The phrase "is a directory" means that the object is of type
NF4DIR or of type NF4ATTRDIR.
o The phrase "is a special file" means that the object is of one of
the types NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO.
o The phrase "is an ordinary file" means that the object is of type
NF4REG or of type NF4NAMEDATTR.
5.7.4. Attribute 2: fh_expire_type
Server uses this to specify filehandle expiration behavior to the Server uses this to specify filehandle expiration behavior to the
client. See the section "Filehandles" for additional description. client. See Section 4 for additional description.
5.7.4. Attribute 3: change 5.7.5. Attribute 3: change
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
file data, directory contents or attributes of the object have been file data, directory contents or attributes of the object have been
modified. The server may return the object's time_metadata attribute modified. The server may return the object's time_metadata attribute
for this attribute's value but only if the file system object can not for this attribute's value but only if the file system object can not
be updated more frequently than the resolution of time_metadata. be updated more frequently than the resolution of time_metadata.
5.7.5. Attribute 3: size 5.7.6. Attribute 3: size
The size of the object in bytes. The size of the object in bytes.
5.7.6. Attribute 5: link_support 5.7.7. Attribute 5: link_support
True, if the object's file system supports hard links. True, if the object's file system supports hard links.
5.7.7. Attribute 6: symlink_support 5.7.8. Attribute 6: symlink_support
True, if the object's file system supports symbolic links. True, if the object's file system supports symbolic links.
5.7.8. Attribute 7: named_attr 5.7.9. Attribute 7: named_attr
True, if this object has named attributes. In other words, object True, if this object has named attributes. In other words, object
has a non-empty named attribute directory. has a non-empty named attribute directory.
5.7.9. Attribute 8: fsid 5.7.10. Attribute 8: fsid
Unique file system identifier for the file system holding this Unique file system identifier for the file system holding this
object. fsid contains major and minor components each of which are object. fsid contains major and minor components each of which are
uint64. uint64_t.
5.7.10. Attribute 9: unique_handles 5.7.11. Attribute 9: unique_handles
True, if two distinct filehandles guaranteed to refer to two True, if two distinct filehandles guaranteed to refer to two
different file system objects. different file system objects.
5.7.11. Attribute 10: lease_time 5.7.12. Attribute 10: lease_time
Duration of leases at server in seconds. Duration of leases at server in seconds.
5.7.12. Attribute 11: rdattr_error 5.7.13. Attribute 11: rdattr_error
Error returned from getattr during readdir. Error returned from getattr during readdir.
5.7.13. Attribute 19: filehandle 5.7.14. Attribute 19: filehandle
The filehandle of this object (primarily for readdir requests). The filehandle of this object (primarily for readdir requests).
5.7.14. Attribute 14: archive 5.7.15. Attribute 14: archive
True, if this file has been archived since the time of last True, if this file has been archived since the time of last
modification (deprecated in favor of time_backup). modification (deprecated in favor of time_backup).
5.7.15. Attribute 15: cansettime 5.7.16. Attribute 15: cansettime
True, if the server able to change the times for a file system object True, if the server able to change the times for a file system object
as specified in a SETATTR operation. as specified in a SETATTR operation.
5.7.16. Attribute 16: case_insensitive 5.7.17. Attribute 16: case_insensitive
True, if filename comparisons on this file system are case True, if filename comparisons on this file system are case
insensitive. insensitive.
5.7.17. Attribute 60: change_policy 5.7.18. Attribute 60: change_policy
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
some server policy related to the current file system has been some server policy related to the current file system has been
subject to change. If the value remains the same then the client can subject to change. If the value remains the same then the client can
be sure that the values of the attributes related to fs location and be sure that the values of the attributes related to fs location and
the fsstat_type field of the fs_status attribute have not changed. the fsstat_type field of the fs_status attribute have not changed.
See Section 3.2.6 for details. On the other hand, a change in this value does necessarily imply a
change in policy. It is up to the client to interrogate the server
to determine if some policy relevant to it has changed. See
Section 3.3.6 for details.
5.7.18. Attribute 17: case_preserving This attribute MUST change when the value returned by the
fs_locations or fs_locations_info attribute changes, when a file
system goes from read-only to writable or vice versa, or when the
allowable set of security flavors for the file system or any part
thereof is changed.
5.7.19. Attribute 17: case_preserving
True, if filename case on this file system are preserved. True, if filename case on this file system are preserved.
5.7.19. Attribute 18: chown_restricted 5.7.20. Attribute 18: chown_restricted
If TRUE, the server will reject any request to change either the If TRUE, the server will reject any request to change either the
owner or the group associated with a file if the caller is not a owner or the group associated with a file if the caller is not a
privileged user (for example, "root" in UNIX operating environments privileged user (for example, "root" in UNIX operating environments
or in Windows 2000 the "Take Ownership" privilege). or in Windows 2000 the "Take Ownership" privilege).
5.7.20. Attribute 20: fileid 5.7.21. Attribute 20: fileid
A number uniquely identifying the file within the file system. A number uniquely identifying the file within the file system.
5.7.21. Attribute 21: files_avail 5.7.22. Attribute 21: files_avail
File slots available to this user on the file system containing this File slots available to this user on the file system containing this
object - this should be the smallest relevant limit. object - this should be the smallest relevant limit.
5.7.22. Attribute 22: files_free 5.7.23. Attribute 22: files_free
Free file slots on the file system containing this object - this Free file slots on the file system containing this object - this
should be the smallest relevant limit. should be the smallest relevant limit.
5.7.23. Attribute 23: files_total 5.7.24. Attribute 23: files_total
Total file slots on the file system containing this object. Total file slots on the file system containing this object.
5.7.24. Attribute 24: fs_locations 5.7.25. Attribute 76: fs_charset_cap
Character set capabilities for this file system. See Section 14.4.
5.7.26. Attribute 24: fs_locations
Locations where this file system may be found. If the server returns Locations where this file system may be found. If the server returns
NFS4ERR_MOVED as an error, this attribute MUST be supported. NFS4ERR_MOVED as an error, this attribute MUST be supported.
5.7.25. Attribute 67: fs_locations_info 5.7.27. Attribute 67: fs_locations_info
Full function file system location. Full function file system location.
5.7.26. Attribute 61: fs_status 5.7.28. Attribute 61: fs_status
Generic file system type information. Generic file system type information.
5.7.27. Attribute 25: hidden 5.7.29. Attribute 25: hidden
True, if the file is considered hidden with respect to the Windows True, if the file is considered hidden with respect to the Windows
API. API.
5.7.28. Attribute 26: homogeneous 5.7.30. Attribute 26: homogeneous
True, if this object's file system is homogeneous, i.e. are per file True, if this object's file system is homogeneous, i.e. are per file
system attributes the same for all file system's objects. system attributes the same for all file system's objects.
5.7.29. Attribute 27: maxfilesize 5.7.31. Attribute 27: maxfilesize
Maximum supported file size for the file system of this object. Maximum supported file size for the file system of this object.
5.7.30. Attribute 28: maxlink 5.7.32. Attribute 28: maxlink
Maximum number of links for this object. Maximum number of links for this object.
5.7.31. Attribute 29: maxname 5.7.33. Attribute 29: maxname
Maximum filename size supported for this object. Maximum filename size supported for this object.
5.7.32. Attribute 30: maxread 5.7.34. Attribute 30: maxread
Maximum read size supported for this object. Maximum read size supported for this object.
5.7.33. Attribute 31: maxwrite 5.7.35. Attribute 31: maxwrite
Maximum write size supported for this object. This attribute SHOULD Maximum write size supported for this object. This attribute SHOULD
be supported if the file is writable. Lack of this attribute can be supported if the file is writable. Lack of this attribute can
lead to the client either wasting bandwidth or not receiving the best lead to the client either wasting bandwidth or not receiving the best
performance. performance.
5.7.34. Attribute 32: mimetype 5.7.36. Attribute 32: mimetype
MIME body type/subtype of this object. MIME body type/subtype of this object.
5.7.35. Attribute 55: mounted_on_fileid 5.7.37. Attribute 55: mounted_on_fileid
Like fileid, but if the target filehandle is the root of a file Like fileid, but if the target filehandle is the root of a file
system return the fileid of the underlying directory. system return the fileid of the underlying directory.
UNIX-based operating environments connect a file system into the UNIX-based operating environments connect a file system into the
namespace by connecting (mounting) the file system onto the existing namespace by connecting (mounting) the file system onto the existing
file object (the mount point, usually a directory) of an existing file object (the mount point, usually a directory) of an existing
file system. When the mount point's parent directory is read via an file system. When the mount point's parent directory is read via an
API like readdir(), the return results are directory entries, each API like readdir(), the return results are directory entries, each
with a component name and a fileid. The fileid of the mount point's with a component name and a fileid. The fileid of the mount point's
directory entry will be different from the fileid that the stat() directory entry will be different from the fileid that the stat()
system call returns. The stat() system call is returning the fileid system call returns. The stat() system call is returning the fileid
of the root of the mounted file system, whereas readdir() is of the root of the mounted file system, whereas readdir() is
returning the fileid stat() would have returned before any file returning the fileid stat() would have returned before any file
systems were mounted on the mount point. systems were mounted on the mount point.
Unlike NFS version 3, NFS version 4 allows a client's LOOKUP request Unlike NFSv3, NFSv4.1 allows a client's LOOKUP request to cross other
to cross other file systems. The client detects the file system file systems. The client detects the file system crossing whenever
crossing whenever the filehandle argument of LOOKUP has an fsid the filehandle argument of LOOKUP has an fsid attribute different
attribute different from that of the filehandle returned by LOOKUP. from that of the filehandle returned by LOOKUP. A UNIX-based client
A UNIX-based client will consider this a "mount point crossing". will consider this a "mount point crossing". UNIX has a legacy
UNIX has a legacy scheme for allowing a process to determine its scheme for allowing a process to determine its current working
current working directory. This relies on readdir() of a mount directory. This relies on readdir() of a mount point's parent and
point's parent and stat() of the mount point returning fileids as stat() of the mount point returning fileids as previously described.
previously described. The mounted_on_fileid attribute corresponds to The mounted_on_fileid attribute corresponds to the fileid that
the fileid that readdir() would have returned as described readdir() would have returned as described previously.
previously.
While the NFS version 4 client could simply fabricate a fileid While the NFSv4.1 client could simply fabricate a fileid
corresponding to what mounted_on_fileid provides (and if the server corresponding to what mounted_on_fileid provides (and if the server
does not support mounted_on_fileid, the client has no choice), there does not support mounted_on_fileid, the client has no choice), there
is a risk that the client will generate a fileid that conflicts with is a risk that the client will generate a fileid that conflicts with
one that is already assigned to another object in the file system. one that is already assigned to another object in the file system.
Instead, if the server can provide the mounted_on_fileid, the Instead, if the server can provide the mounted_on_fileid, the
potential for client operational problems in this area is eliminated. potential for client operational problems in this area is eliminated.
If the server detects that there is no mounted point at the target If the server detects that there is no mounted point at the target
file object, then the value for mounted_on_fileid that it returns is file object, then the value for mounted_on_fileid that it returns is
the same as that of the fileid attribute. the same as that of the fileid attribute.
skipping to change at page 101, line 35 skipping to change at page 104, line 48
fileid of a directory entry returned by readdir(). If fileid of a directory entry returned by readdir(). If
mounted_on_fileid is requested in a GETATTR operation, the server mounted_on_fileid is requested in a GETATTR operation, the server
should obey an invariant that has it returning a value that is equal should obey an invariant that has it returning a value that is equal
to the file object's entry in the object's parent directory, i.e. to the file object's entry in the object's parent directory, i.e.
what readdir() would have returned. Some operating environments what readdir() would have returned. Some operating environments
allow a series of two or more file systems to be mounted onto a allow a series of two or more file systems to be mounted onto a
single mount point. In this case, for the server to obey the single mount point. In this case, for the server to obey the
aforementioned invariant, it will need to find the base mount point, aforementioned invariant, it will need to find the base mount point,
and not the intermediate mount points. and not the intermediate mount points.
5.7.36. Attribute 34: no_trunc 5.7.38. Attribute 34: no_trunc
True, if a name longer than name_max is used, an error be returned True, if a name longer than name_max is used, an error be returned
and name is not truncated. and name is not truncated.
5.7.37. Attribute 35: numlinks 5.7.39. Attribute 35: numlinks
Number of hard links to this object. Number of hard links to this object.
5.7.38. Attribute 36: owner 5.7.40. Attribute 36: owner
The string name of the owner of this object. The string name of the owner of this object.
5.7.39. Attribute 37: owner_group 5.7.41. Attribute 37: owner_group
The string name of the group ownership of this object. The string name of the group ownership of this object.
5.7.40. Attribute 38: quota_avail_hard 5.7.42. Attribute 38: quota_avail_hard
The value in bytes which represent the amount of additional disk The value in bytes which represent the amount of additional disk
space beyond the current allocation that can be allocated to this space beyond the current allocation that can be allocated to this
file or directory before further allocations will be refused. It is file or directory before further allocations will be refused. It is
understood that this space may be consumed by allocations to other understood that this space may be consumed by allocations to other
files or directories. files or directories.
5.7.41. Attribute 39: quota_avail_soft 5.7.43. Attribute 39: quota_avail_soft
The value in bytes which represents the amount of additional disk The value in bytes which represents the amount of additional disk
space that can be allocated to this file or directory before the user space that can be allocated to this file or directory before the user
may reasonably be warned. It is understood that this space may be may reasonably be warned. It is understood that this space may be
consumed by allocations to other files or directories though there is consumed by allocations to other files or directories though there is
a rule as to which other files or directories. a rule as to which other files or directories.
5.7.42. Attribute 40: quota_used 5.7.44. Attribute 40: quota_used
The value in bytes which represent the amount of disc space used by The value in bytes which represent the amount of disc space used by
this file or directory and possibly a number of other similar files this file or directory and possibly a number of other similar files
or directories, where the set of "similar" meets at least the or directories, where the set of "similar" meets at least the
criterion that allocating space to any file or directory in the set criterion that allocating space to any file or directory in the set
will reduce the "quota_avail_hard" of every other file or directory will reduce the "quota_avail_hard" of every other file or directory
in the set. in the set.
Note that there may be a number of distinct but overlapping sets of Note that there may be a number of distinct but overlapping sets of
files or directories for which a quota_used value is maintained. files or directories for which a quota_used value is maintained.
E.g. "all files with a given owner", "all files with a given group E.g. "all files with a given owner", "all files with a given group
owner". etc. owner". etc.
The server is at liberty to choose any of those sets but should do so The server is at liberty to choose any of those sets but should do so
in a repeatable way. The rule may be configured per file system or in a repeatable way. The rule may be configured per file system or
may be "choose the set with the smallest quota". may be "choose the set with the smallest quota".
5.7.43. Attribute 41: rawdev 5.7.45. Attribute 41: rawdev
Raw device identifier. UNIX device major/minor node information. If Raw device identifier. UNIX device major/minor node information. If
the value of type is not NF4BLK or NF4CHR, the value return SHOULD the value of type is not NF4BLK or NF4CHR, the value return SHOULD
NOT be considered useful. NOT be considered useful.
5.7.44. Attribute 42: space_avail 5.7.46. Attribute 42: space_avail
Disk space in bytes available to this user on the file system Disk space in bytes available to this user on the file system
containing this object - this should be the smallest relevant limit. containing this object - this should be the smallest relevant limit.
5.7.45. Attribute 43: space_free 5.7.47. Attribute 43: space_free
Free disk space in bytes on the file system containing this object - Free disk space in bytes on the file system containing this object -
this should be the smallest relevant limit. this should be the smallest relevant limit.
5.7.46. Attribute 44: space_total 5.7.48. Attribute 44: space_total
Total disk space in bytes on the file system containing this object. Total disk space in bytes on the file system containing this object.
5.7.47. Attribute 45: space_used 5.7.49. Attribute 45: space_used
Number of file system bytes allocated to this object. Number of file system bytes allocated to this object.
5.7.48. Attribute 46: system 5.7.50. Attribute 46: system
True, if this file is a "system" file with respect to the Windows True, if this file is a "system" file with respect to the Windows
API. API.
5.7.49. Attribute 47: time_access 5.7.51. Attribute 47: time_access
The time_access attribute represents the time of last access to the The time_access attribute represents the time of last access to the
object by a read that was satisfied by the server. The notion of object by a read that was satisfied by the server. The notion of
what is an "access" depends on server's operating environment and/or what is an "access" depends on server's operating environment and/or
the server's file system semantics. For example, for servers obeying the server's file system semantics. For example, for servers obeying
POSIX semantics, time_access would be updated only by the READLINK, POSIX semantics, time_access would be updated only by the READLINK,
READ, and READDIR operations and not any of the operations that READ, and READDIR operations and not any of the operations that
modify the content of the object. Of course, setting the modify the content of the object. Of course, setting the
corresponding time_access_set attribute is another way to modify the corresponding time_access_set attribute is another way to modify the
time_access attribute. time_access attribute.
Whenever the file object resides on a writable file system, the Whenever the file object resides on a writable file system, the
server should make best efforts to record time_access into stable server should make best efforts to record time_access into stable
storage. However, to mitigate the performance effects of doing so, storage. However, to mitigate the performance effects of doing so,
and most especially whenever the server is satisfying the read of the and most especially whenever the server is satisfying the read of the
object's content from its cache, the server MAY cache access time object's content from its cache, the server MAY cache access time
updates and lazily write them to stable storage. It is also updates and lazily write them to stable storage. It is also
acceptable to give administrators of the server the option to disable acceptable to give administrators of the server the option to disable
time_access updates. time_access updates.
5.7.50. Attribute 48: time_access_set 5.7.52. Attribute 48: time_access_set
Set the time of last access to the object. SETATTR use only. Set the time of last access to the object. SETATTR use only.
5.7.51. Attribute 49: time_backup 5.7.53. Attribute 49: time_backup
The time of last backup of the object. The time of last backup of the object.
5.7.52. Attribute 50: time_create 5.7.54. Attribute 50: time_create
The time of creation of the object. This attribute does not have any The time of creation of the object. This attribute does not have any
relation to the traditional UNIX file attribute "ctime" or "change relation to the traditional UNIX file attribute "ctime" or "change
time". time".
5.7.53. Attribute 51: time_delta 5.7.55. Attribute 51: time_delta
Smallest useful server time granularity. Smallest useful server time granularity.
5.7.54. Attribute 52: time_metadata 5.7.56. Attribute 52: time_metadata
The time of last meta-data modification of the object. The time of last meta-data modification of the object.
5.7.55. Attribute 53: time_modify 5.7.57. Attribute 53: time_modify
The time of last modification to the object. The time of last modification to the object.
5.7.56. Attribute 54: time_modify_set 5.7.58. Attribute 54: time_modify_set
Set the time of last modification to the object. SETATTR use only. Set the time of last modification to the object. SETATTR use only.
5.8. Interpreting owner and owner_group 5.8. Interpreting owner and owner_group
The recommended attributes "owner" and "owner_group" (and also users The recommended attributes "owner" and "owner_group" (and also users
and groups within the "acl" attribute) are represented in terms of a and groups within the "acl" attribute) are represented in terms of a
UTF-8 string. To avoid a representation that is tied to a particular UTF-8 string. To avoid a representation that is tied to a particular
underlying implementation at the client or server, the use of the underlying implementation at the client or server, the use of the
UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [33] UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [33]
skipping to change at page 105, line 50 skipping to change at page 109, line 10
In the case where there is no translation available to the client or In the case where there is no translation available to the client or
server, the attribute value must be constructed without the "@". server, the attribute value must be constructed without the "@".
Therefore, the absence of the @ from the owner or owner_group Therefore, the absence of the @ from the owner or owner_group
attribute signifies that no translation was available at the sender attribute signifies that no translation was available at the sender
and that the receiver of the attribute should not use that string as and that the receiver of the attribute should not use that string as
a basis for translation into its own internal format. Even though a basis for translation into its own internal format. Even though
the attribute value can not be translated, it may still be useful. the attribute value can not be translated, it may still be useful.
In the case of a client, the attribute string may be used for local In the case of a client, the attribute string may be used for local
display of ownership. display of ownership.
To provide a greater degree of compatibility with previous versions To provide a greater degree of compatibility with NFSv3, which
of NFS (i.e. v2 and v3), which identified users and groups by 32-bit identified users and groups by 32-bit unsigned uid's and gid's, owner
unsigned uid's and gid's, owner and group strings that consist of and group strings that consist of decimal numeric values with no
decimal numeric values with no leading zeros can be given a special leading zeros can be given a special interpretation by clients and
interpretation by clients and servers which choose to provide such servers which choose to provide such support. The receiver may treat
support. The receiver may treat such a user or group string as such a user or group string as representing the same user as would be
representing the same user as would be represented by a v2/v3 uid or represented by an NFSv3 uid or gid having the corresponding numeric
gid having the corresponding numeric value. A server is not value. A server is not obligated to accept such a string, but may
obligated to accept such a string, but may return an NFS4ERR_BADOWNER return an NFS4ERR_BADOWNER instead. To avoid this mechanism being
instead. To avoid this mechanism being used to subvert user and used to subvert user and group translation, so that a client might
group translation, so that a client might pass all of the owners and pass all of the owners and groups in numeric form, a server SHOULD
groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER return an NFS4ERR_BADOWNER error when there is a valid translation
error when there is a valid translation for the user or owner for the user or owner designated in this way. In that case, the
designated in this way. In that case, the client must use the client must use the appropriate name@domain string and not the
appropriate name@domain string and not the special form for special form for compatibility.
compatibility.
The owner string "nobody" may be used to designate an anonymous user, The owner string "nobody" may be used to designate an anonymous user,
which will be associated with a file created by a security principal which will be associated with a file created by a security principal
that cannot be mapped through normal means to the owner attribute. that cannot be mapped through normal means to the owner attribute.
5.9. Character Case Attributes 5.9. Character Case Attributes
With respect to the case_insensitive and case_preserving attributes, With respect to the case_insensitive and case_preserving attributes,
each UCS-4 character (which UTF-8 encodes) has a "long descriptive each UCS-4 character (which UTF-8 encodes) has a "long descriptive
name" RFC1345 [34] which may or may not included the word "CAPITAL" name" RFC1345 [34] which may or may not included the word "CAPITAL"
or "SMALL". The presence of SMALL or CAPITAL allows an NFS server to or "SMALL". The presence of SMALL or CAPITAL allows an NFS server to
implement unambiguous and efficient table driven mappings for case implement unambiguous and efficient table driven mappings for case
insensitive comparisons, and non-case-preserving storage. For insensitive comparisons, and non-case-preserving storage. For
general character handling and internationalization issues, see the general character handling and internationalization issues, see
section Internationalization (Section 14). Section 14.
5.10. Directory Notification Attributes 5.10. Directory Notification Attributes
As described in Section 18.39, the client can request a minimum delay As described in Section 18.39, the client can request a minimum delay
for notifications of changes to attributes, but the server is free to for notifications of changes to attributes, but the server is free to
ignore what the client requests. The client can determine in advance ignore what the client requests. The client can determine in advance
what notification delays the server will accept by issuing a GETATTR what notification delays the server will accept by issuing a GETATTR
for either or both of two directory notification attributes. When for either or both of two directory notification attributes. When
the client calls the GET_DIR_DELEGATION operation and asks for the client calls the GET_DIR_DELEGATION operation and asks for
attribute change notifications, it should request notification delays attribute change notifications, it should request notification delays
skipping to change at page 107, line 15 skipping to change at page 110, line 21
5.10.2. Attribute 57: dirent_notif_delay 5.10.2. Attribute 57: dirent_notif_delay
The dirent_notif_delay attribute is the minimum number of seconds the The dirent_notif_delay attribute is the minimum number of seconds the
server will delay before notifying the client of a change to a file server will delay before notifying the client of a change to a file
object that has an entry in the directory. object that has an entry in the directory.
5.11. pNFS Attribute Definitions 5.11. pNFS Attribute Definitions
5.11.1. Attribute 62: fs_layout_type 5.11.1. Attribute 62: fs_layout_type
The fs_layout_type attribute (data type layouttype4 (Section 3.2.15)) The fs_layout_type attribute (data type layouttype4 (Section 3.3.13))
applies to a file system and indicates what layout types are applies to a file system and indicates what layout types are
supported by the file system. When the client encounters a new fsid, supported by the file system. When the client encounters a new fsid,
the client should obtain the value for the fs_layout_type attribute the client should obtain the value for the fs_layout_type attribute
associated with the new file system. This attribute is used by the associated with the new file system. This attribute is used by the
client to determine if the layout types supported by the server match client to determine if the layout types supported by the server match
any of the client's supported layout types. any of the client's supported layout types.
5.11.2. Attribute 66: layout_alignment 5.11.2. Attribute 66: layout_alignment
When a client has layouts for a file system, the layout_alignment When a client has layouts for a file system, the layout_alignment
attribute indicates the preferred alignment for I/O to files on that attribute indicates the preferred alignment for I/O to files on that
file system. Where possible, the client should issue READ and WRITE file system. Where possible, the client should send READ and WRITE
operations with offsets that are whole multiples of the operations with offsets that are whole multiples of the
layout_alignment attribute. layout_alignment attribute.
5.11.3. Attribute 65: layout_blksize 5.11.3. Attribute 65: layout_blksize
When a client has layouts for a file system, the layout_blksize When a client has layouts for a file system, the layout_blksize
attribute indicates the preferred block size for I/O to files on that attribute indicates the preferred block size for I/O to files on that
file system. Where possible, the client should issue READ operations file system. Where possible, the client should send READ operations
with a count argument that is a whole multiple of layout_blksize, and with a count argument that is a whole multiple of layout_blksize, and
WRITE operations with a data argument of size that is a whole WRITE operations with a data argument of size that is a whole
multiple of layout_blksize. multiple of layout_blksize.
5.11.4. Attribute 63: layout_hint 5.11.4. Attribute 63: layout_hint
The layout_hint attribute (data type layouthint4 (Section 3.2.22)) The layout_hint attribute (data type layouthint4 (Section 3.3.20))
may be set on newly created files to influence the metadata server's may be set on newly created files to influence the metadata server's
choice for the file's layout. If possible, this attribute is one of choice for the file's layout. If possible, this attribute is one of
those set in the initial attributes within the OPEN operation. The those set in the initial attributes within the OPEN operation. The
metadata server may choose to ignore this attribute. The layout_hint metadata server may choose to ignore this attribute. The layout_hint
attribute is a sub-set of the layout structure returned by LAYOUTGET. attribute is a sub-set of the layout structure returned by LAYOUTGET.
For example, instead of specifying particular devices, this would be For example, instead of specifying particular devices, this would be
used to suggest the stripe width of a file. The server used to suggest the stripe width of a file. The server
implementation determines which fields within the layout will be implementation determines which fields within the layout will be
used. used.
skipping to change at page 108, line 16 skipping to change at page 111, line 21
This attribute lists the layout type(s) available for a file. The This attribute lists the layout type(s) available for a file. The
value returned by the server is for informational purposes only. The value returned by the server is for informational purposes only. The
client will use the LAYOUTGET operation to obtain the information client will use the LAYOUTGET operation to obtain the information
needed in order to perform I/O. For example, the specific device needed in order to perform I/O. For example, the specific device
information for the file and its layout. information for the file and its layout.
5.11.6. Attribute 68: mdsthreshold 5.11.6. Attribute 68: mdsthreshold
This attribute is a server provided hint used to communicate to the This attribute is a server provided hint used to communicate to the
client when it is more efficient to issue READ and WRITE operations client when it is more efficient to send READ and WRITE operations to
to the metadata server or the data server. The two types of the metadata server or the data server. The two types of thresholds
thresholds described are file size thresholds and I/O size described are file size thresholds and I/O size thresholds. If a
thresholds. If a file's size is smaller than the file size file's size is smaller than the file size threshold, data accesses
threshold, data accesses should be issued to the metadata server. If should be sent to the metadata server. If an I/O is below the I/O
an I/O is below the I/O size threshold, the I/O should be issued to size threshold, the I/O should be sent to the metadata server. As
the metadata server. As defined, each threshold type is specified defined, each threshold type is specified separately for READ and
separately for READ and WRITE. WRITE.
The server may provide both types of thresholds for a file. If both The server may provide both types of thresholds for a file. If both
file size and I/O size are provided, the client should exceed both file size and I/O size are provided, the client should exceed both
thresholds before issuing its READ or WRITE requests to the data thresholds before issuing its READ or WRITE requests to the data
server. Alternatively, if only one of the specified thresholds is server. Alternatively, if only one of the specified thresholds is
exceeded, the I/O requests are issued to the metadata server. exceeded, the I/O requests are sent to the metadata server.
For each threshold type, a value of 0 indicates no READ or WRITE For each threshold type, a value of 0 indicates no READ or WRITE
should be issued to the metadata server, while a value of all 1s should be sent to the metadata server, while a value of all 1s
indicates all READS or WRITES should be issued to the metadata indicates all READS or WRITES should be sent to the metadata server.
server.
The attribute is available on a per filehandle basis. If the current The attribute is available on a per filehandle basis. If the current
filehandle refers to a non-pNFS file or directory, the metadata filehandle refers to a non-pNFS file or directory, the metadata
server should return an attribute that is representative of the server should return an attribute that is representative of the
filehandle's file system. It is suggested that this attribute is filehandle's file system. It is suggested that this attribute is
queried as part of the OPEN operation. Due to dynamic system queried as part of the OPEN operation. Due to dynamic system
changes, the client should not assume that the attribute will remain changes, the client should not assume that the attribute will remain
constant for any specific time period, thus it should be periodically constant for any specific time period, thus it should be periodically
refreshed. refreshed.
skipping to change at page 111, line 43 skipping to change at page 115, line 5
* Setting only the mode attribute should effectively control the * Setting only the mode attribute should effectively control the
traditional UNIX-like permissions of read, write, and execute traditional UNIX-like permissions of read, write, and execute
on owner, owner_group, and other. on owner, owner_group, and other.
* Setting only the mode attribute should provide reasonable * Setting only the mode attribute should provide reasonable
security. For example, setting a mode of 000 should be enough security. For example, setting a mode of 000 should be enough
to ensure that future opens for read or write by any principal to ensure that future opens for read or write by any principal
fail, regardless of a previously existing or inherited ACL. fail, regardless of a previously existing or inherited ACL.
o This minor version of NFSv4 may introduce different semantics o NFSv4.1 may introduce different semantics relating to the mode and
relating to the mode and ACL attributes, but it does not render ACL attributes, but it does not render invalid any previously
invalid any previously existing implementations. Additionally, existing implementations. Additionally, this chapter provides
this chapter provides clarifications based on previous clarifications based on previous implementations and discussions
implementations and discussions around them. around them.
o On servers that support both the mode and the acl or dacl o On servers that support both the mode and the acl or dacl
attributes, the server must keep the two consistent with each attributes, the server must keep the two consistent with each
other. The value of the mode attribute (with the exception of the other. The value of the mode attribute (with the exception of the
three high order bits described in Section 6.2.4), must be three high order bits described in Section 6.2.4), must be
determined entirely by the value of the ACL, so that use of the determined entirely by the value of the ACL, so that use of the
mode is never required for anything other than setting the three mode is never required for anything other than setting the three
high order bits. See Section 6.4.1 for exact requirements. high order bits. See Section 6.4.1 for exact requirements.
o When a mode attribute is set on an object, the ACL attributes may o When a mode attribute is set on an object, the ACL attributes may
need to be modified so as to not conflict with the new mode. In need to be modified so as to not conflict with the new mode. In
such cases, it is desirable that the ACL keep as much information such cases, it is desirable that the ACL keep as much information
as possible. This includes information about inheritance, AUDIT as possible. This includes information about inheritance, AUDIT
and ALARM ACEs, and permissions granted and denied that do not and ALARM ACEs, and permissions granted and denied that do not
conflict with the new mode. conflict with the new mode.
6.2. File Attributes Discussion 6.2. File Attributes Discussion
6.2.1. Attribute 12: acl 6.2.1. Attribute 12: acl
The NFS version 4 ACL attribute contains an array of access control The NFSv4.1 ACL attribute contains an array of access control entries
entries (ACEs) that are associated with the file system object. (ACEs) that are associated with the file system object. Although the
Although the client can read and write the acl attribute, the server client can read and write the acl attribute, the server is
is responsible for using the ACL to perform access control. The responsible for using the ACL to perform access control. The client
client can use the OPEN or ACCESS operations to check access without can use the OPEN or ACCESS operations to check access without
modifying or reading data or metadata. modifying or reading data or metadata.
The NFS ACE structure is defined as follows: The NFS ACE structure is defined as follows:
typedef uint32_t acetype4; typedef uint32_t acetype4;
typedef uint32_t aceflag4; typedef uint32_t aceflag4;
typedef uint32_t acemask4; typedef uint32_t acemask4;
struct nfsace4 { struct nfsace4 {
acetype4 type; acetype4 type;
aceflag4 flag; aceflag4 flag;
acemask4 access_mask; acemask4 access_mask;
utf8str_mixed who; utf8str_mixed who;
}; };
To determine if a request succeeds, the server processes each nfsace4 To determine if a request succeeds, the server processes each nfsace4
skipping to change at page 113, line 8 skipping to change at page 116, line 23
in common with the "access_mask" of the ACE, the request is denied. in common with the "access_mask" of the ACE, the request is denied.
When the ACL is fully processed, if there are bits in the requester's When the ACL is fully processed, if there are bits in the requester's
mask that have not been ALLOWED or DENIED, access is denied. mask that have not been ALLOWED or DENIED, access is denied.
Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do
not affect a requester's access, and instead are for triggering not affect a requester's access, and instead are for triggering
events as a result of a requester's access attempt. Therefore, AUDIT events as a result of a requester's access attempt. Therefore, AUDIT
and ALARM ACEs are processed only after processing ALLOW and DENY and ALARM ACEs are processed only after processing ALLOW and DENY
ACEs. ACEs.
The NFS version 4 ACL model is quite rich. Some server platforms may The NFSv4.1 ACL model is quite rich. Some server platforms may
provide access control functionality that goes beyond the UNIX-style provide access control functionality that goes beyond the UNIX-style
mode attribute, but which is not as rich as the NFS ACL model. So mode attribute, but which is not as rich as the NFS ACL model. So
that users can take advantage of this more limited functionality, the that users can take advantage of this more limited functionality, the
server may support the acl attributes by mapping between its ACL server may support the acl attributes by mapping between its ACL
model and the NFS version 4 ACL model. Servers must ensure that the model and the NFSv4.1 ACL model. Servers must ensure that the ACL
ACL they actually store or enforce is at least as strict as the NFSv4 they actually store or enforce is at least as strict as the NFSv4 ACL
ACL that was set. It is tempting to accomplish this by rejecting any that was set. It is tempting to accomplish this by rejecting any ACL
ACL that falls outside the small set that can be represented that falls outside the small set that can be represented accurately.
accurately. However, such an approach can render ACLs unusable However, such an approach can render ACLs unusable without special
without special client-side knowledge of the server's mapping, which client-side knowledge of the server's mapping, which defeats the
defeats the purpose of having a common NFSv4 ACL protocol. Therefore purpose of having a common NFSv4 ACL protocol. Therefore servers
servers should accept every ACL that they can without compromising should accept every ACL that they can without compromising security.
security. To help accomplish this, servers may make a special To help accomplish this, servers may make a special exception, in the
exception, in the case of unsupported permission bits, to the rule case of unsupported permission bits, to the rule that bits not
that bits not ALLOWED or DENIED by an ACL must be denied. For ALLOWED or DENIED by an ACL must be denied. For example, a UNIX-
example, a UNIX-style server might choose to silently allow read style server might choose to silently allow read attribute
attribute permissions even though an ACL does not explicitly allow permissions even though an ACL does not explicitly allow those
those permissions. (An ACL that explicitly denies permission to read permissions. (An ACL that explicitly denies permission to read
attributes should still be rejected.) attributes should still be rejected.)
The situation is complicated by the fact that a server may have The situation is complicated by the fact that a server may have
multiple modules that enforce ACLs. For example, the enforcement for multiple modules that enforce ACLs. For example, the enforcement for
NFS version 4 access may be different from, but not weaker than, the NFSv4.1 access may be different from, but not weaker than, the
enforcement for local access, and both may be different from the enforcement for local access, and both may be different from the
enforcement for access through other protocols such as SMB. So it enforcement for access through other protocols such as SMB. So it
may be useful for a server to accept an ACL even if not all of its may be useful for a server to accept an ACL even if not all of its
modules are able to support it. modules are able to support it.
The guiding principle with regard to NFSv4 access is that the server The guiding principle with regard to NFSv4 access is that the server
must not accept ACLs that appear to make the file more secure than it must not accept ACLs that appear to make the file more secure than it
really is. really is.
6.2.1.1. ACE Type 6.2.1.1. ACE Type
skipping to change at page 115, line 34 skipping to change at page 119, line 14
const ACE4_READ_DATA = 0x00000001; const ACE4_READ_DATA = 0x00000001;
const ACE4_LIST_DIRECTORY = 0x00000001; const ACE4_LIST_DIRECTORY = 0x00000001;
const ACE4_WRITE_DATA = 0x00000002; const ACE4_WRITE_DATA = 0x00000002;
const ACE4_ADD_FILE = 0x00000002; const ACE4_ADD_FILE = 0x00000002;
const ACE4_APPEND_DATA = 0x00000004; const ACE4_APPEND_DATA = 0x00000004;
const ACE4_ADD_SUBDIRECTORY = 0x00000004; const ACE4_ADD_SUBDIRECTORY = 0x00000004;
const ACE4_READ_NAMED_ATTRS = 0x00000008; const ACE4_READ_NAMED_ATTRS = 0x00000008;
const ACE4_WRITE_NAMED_ATTRS = 0x00000010; const ACE4_WRITE_NAMED_ATTRS = 0x00000010;
const ACE4_EXECUTE = 0x00000020; const ACE4_EXECUTE = 0x00000020;
const ACE4_TRAVERSE = 0x00000020;
const ACE4_DELETE_CHILD = 0x00000040; const ACE4_DELETE_CHILD = 0x00000040;
const ACE4_READ_ATTRIBUTES = 0x00000080; const ACE4_READ_ATTRIBUTES = 0x00000080;
const ACE4_WRITE_ATTRIBUTES = 0x00000100; const ACE4_WRITE_ATTRIBUTES = 0x00000100;
const ACE4_WRITE_RETENTION = 0x00000200; const ACE4_WRITE_RETENTION = 0x00000200;
const ACE4_WRITE_RETENTION_HOLD = 0x00000400; const ACE4_WRITE_RETENTION_HOLD = 0x00000400;
const ACE4_DELETE = 0x00010000; const ACE4_DELETE = 0x00010000;
const ACE4_READ_ACL = 0x00020000; const ACE4_READ_ACL = 0x00020000;
const ACE4_WRITE_ACL = 0x00040000; const ACE4_WRITE_ACL = 0x00040000;
const ACE4_WRITE_OWNER = 0x00080000; const ACE4_WRITE_OWNER = 0x00080000;
const ACE4_SYNCHRONIZE = 0x00100000; const ACE4_SYNCHRONIZE = 0x00100000;
Note that some masks have coincident values, for example, Note that some masks have coincident values, for example,
ACE4_READ_DATA and ACE4_LIST_DIRECTORY. The mask entries ACE4_READ_DATA and ACE4_LIST_DIRECTORY. The mask entries
ACE4_LIST_DIRECTORY, ACE4_ADD_SUBDIRECTORY, and ACE4_TRAVERSE are ACE4_LIST_DIRECTORY, ACE4_ADD_SUBDIRECTORY, and ACE4_TRAVERSE are
intended to be used with directory objects, while ACE4_READ_DATA, intended to be used with directory objects, while ACE4_READ_DATA,
skipping to change at page 120, line 19 skipping to change at page 123, line 40
Operation(s) affected: Operation(s) affected:
REMOVE REMOVE
RENAME RENAME
Discussion: Discussion:
Permission to delete a file or directory within a directory. Permission to delete a file or directory within a directory.
See section "ACE4_DELETE vs. ACE4_DELETE_CHILD" for information See Section 6.2.1.3.2 for information on ACE4_DELETE and
on how these two access mask bits interact. ACE4_DELETE_CHILD interact.
ACE4_READ_ATTRIBUTES ACE4_READ_ATTRIBUTES
Operation(s) affected: Operation(s) affected:
GETATTR of file system object attributes GETATTR of file system object attributes
VERIFY VERIFY
NVERIFY NVERIFY
READDIR READDIR
Discussion: Discussion:
The ability to read basic attributes (non-ACLs) of a file. On The ability to read basic attributes (non-ACLs) of a file. On
a UNIX system, basic attributes can be thought of as the stat a UNIX system, basic attributes can be thought of as the stat
skipping to change at page 121, line 47 skipping to change at page 125, line 25
ACE_WRITE_RETENTION_HOLD. ACE_WRITE_RETENTION_HOLD.
ACE4_DELETE ACE4_DELETE
Operation(s) affected: Operation(s) affected:
REMOVE REMOVE
Discussion: Discussion:
Permission to delete the file or directory. See section Permission to delete the file or directory. See
"ACE4_DELETE vs. ACE4_DELETE_CHILD" for information on how Section 6.2.1.3.2 for information on ACE4_DELETE and
these two access mask bits interact. ACE4_DELETE_CHILD interact.
ACE4_READ_ACL ACE4_READ_ACL
Operation(s) affected: Operation(s) affected:
GETATTR of acl, dacl, or sacl GETATTR of acl, dacl, or sacl
NVERIFY NVERIFY
VERIFY VERIFY
skipping to change at page 127, line 21 skipping to change at page 131, line 7
6.2.3. Attribute 59: sacl 6.2.3. Attribute 59: sacl
The sacl, and dacl, attributes are like the acl attribute, but dacl The sacl, and dacl, attributes are like the acl attribute, but dacl
and sacl each allow only certain types of ACEs. The sacl attribute and sacl each allow only certain types of ACEs. The sacl attribute
allows just AUDIT and ALARM ACEs. The dacl and sacl attributes also allows just AUDIT and ALARM ACEs. The dacl and sacl attributes also
support automatic inheritance (see Section 6.4.3.2). support automatic inheritance (see Section 6.4.3.2).
6.2.4. Attribute 33: mode 6.2.4. Attribute 33: mode
The NFS version 4 mode attribute is based on the UNIX mode bits. The The NFSv4.1 mode attribute is based on the UNIX mode bits. The
following bits are defined: following bits are defined:
const MODE4_SUID = 0x800; /* set user id on execution */ const MODE4_SUID = 0x800; /* set user id on execution */
const MODE4_SGID = 0x400; /* set group id on execution */ const MODE4_SGID = 0x400; /* set group id on execution */
const MODE4_SVTX = 0x200; /* save text even after use */ const MODE4_SVTX = 0x200; /* save text even after use */
const MODE4_RUSR = 0x100; /* read permission: owner */ const MODE4_RUSR = 0x100; /* read permission: owner */
const MODE4_WUSR = 0x080; /* write permission: owner */ const MODE4_WUSR = 0x080; /* write permission: owner */
const MODE4_XUSR = 0x040; /* execute permission: owner */ const MODE4_XUSR = 0x040; /* execute permission: owner */
const MODE4_RGRP = 0x020; /* read permission: group */ const MODE4_RGRP = 0x020; /* read permission: group */
const MODE4_WGRP = 0x010; /* write permission: group */ const MODE4_WGRP = 0x010; /* write permission: group */
skipping to change at page 129, line 38 skipping to change at page 133, line 22
Clients SHOULD NOT do their own access checks based on their Clients SHOULD NOT do their own access checks based on their
interpretation the ACL, but rather use the OPEN and ACCESS operations interpretation the ACL, but rather use the OPEN and ACCESS operations
to do access checks. This allows the client to act on the results of to do access checks. This allows the client to act on the results of
having the server determine whether or not access should be granted having the server determine whether or not access should be granted
based on its interpretation of the ACL. based on its interpretation of the ACL.
Clients must be aware of situations in which an object's ACL will Clients must be aware of situations in which an object's ACL will
define a certain access even though the server will not enforce it. define a certain access even though the server will not enforce it.
In general, but especially in these situations, the client needs to In general, but especially in these situations, the client needs to
do its part in the enforcement of access as defined by the ACL. To do its part in the enforcement of access as defined by the ACL. To
do this, the client MAY issue the appropriate ACCESS operation prior do this, the client MAY send the appropriate ACCESS operation prior
to servicing the request of the user or application in order to to servicing the request of the user or application in order to
determine whether the user or application should be granted the determine whether the user or application should be granted the
access requested. For examples in which the ACL may define accesses access requested. For examples in which the ACL may define accesses
that the server doesn't enforce see Section 6.3.1.1. that the server doesn't enforce see Section 6.3.1.1.
6.3.2. Computing a Mode Attribute from an ACL 6.3.2. Computing a Mode Attribute from an ACL
The following method can be used to calculate the MODE4_R*, MODE4_W* The following method can be used to calculate the MODE4_R*, MODE4_W*
and MODE4_X* bits of a mode attribute, based upon an ACL. and MODE4_X* bits of a mode attribute, based upon an ACL.
skipping to change at page 133, line 41 skipping to change at page 137, line 27
3. If both mode and ACL are given in the call: 3. If both mode and ACL are given in the call:
In this case, inheritance SHOULD NOT take place, and both In this case, inheritance SHOULD NOT take place, and both
attributes will be set as described in Section 6.4.1.3. attributes will be set as described in Section 6.4.1.3.
4. If neither mode nor ACL are given in the call: 4. If neither mode nor ACL are given in the call:
In the case where an object is being created without any initial In the case where an object is being created without any initial
attributes at all, e.g. an OPEN operation with an opentype4 of attributes at all, e.g. an OPEN operation with an opentype4 of
OPEN4_CREATE and a createmode4 of EXCLUSIVE4, inheritance SHOULD OPEN4_CREATE and a createmode4 of EXCLUSIVE4, inheritance SHOULD
NOT take place. Instead, the server SHOULD set permissions to NOT take place (note that EXCLUSIVE4_1 is a better choice of
deny all access to the newly created object. It is expected that createmode4, since it does permit initial attributes). Instead,
the appropriate client will set the desired attributes in a the server SHOULD set permissions to deny all access to the newly
subsequent SETATTR operation, and the server SHOULD allow that created object. It is expected that the appropriate client will
operation to succeed, regardless of what permissions the object set the desired attributes in a subsequent SETATTR operation, and
is created with. For example, an empty ACL denies all the server SHOULD allow that operation to succeed, regardless of
permissions, but the server should allow the owner's SETATTR to what permissions the object is created with. For example, an
succeed even though WRITE_ACL is implicitly denied. empty ACL denies all permissions, but the server should allow the
owner's SETATTR to succeed even though WRITE_ACL is implicitly
denied.
In other cases, inheritance SHOULD take place, and no In other cases, inheritance SHOULD take place, and no
modifications to the ACL will happen. The mode attribute, if modifications to the ACL will happen. The mode attribute, if
supported, MUST be as computed in Section 6.3.2, with the supported, MUST be as computed in Section 6.3.2, with the
MODE4_SUID, MODE4_SGID and MODE4_SVTX bits clear. If no MODE4_SUID, MODE4_SGID and MODE4_SVTX bits clear. If no
inheritable ACEs exist on the parent directory, the rules for inheritable ACEs exist on the parent directory, the rules for
creating acl, dacl or sacl attributes are implementation defined. creating acl, dacl or sacl attributes are implementation defined.
If either the dacl or sacl attribute is supported, then the If either the dacl or sacl attribute is supported, then the
ACL4_DEFAULTED flag SHOULD be set on the newly created ACL4_DEFAULTED flag SHOULD be set on the newly created
attributes. attributes.
skipping to change at page 137, line 22 skipping to change at page 141, line 12
of the namespace are made available via an "export" feature. In of the namespace are made available via an "export" feature. In
previous versions of the NFS protocol, the root filehandle for each previous versions of the NFS protocol, the root filehandle for each
export is obtained through the MOUNT protocol; the client sent a export is obtained through the MOUNT protocol; the client sent a
string that identified the export name within the namespace and the string that identified the export name within the namespace and the
server returned the root filehandle for that export. The MOUNT server returned the root filehandle for that export. The MOUNT
protocol also provided an EXPORTS procedure that enumerated server's protocol also provided an EXPORTS procedure that enumerated server's
exports. exports.
7.2. Browsing Exports 7.2. Browsing Exports
The NFS version 4 protocol provides a root filehandle that clients The NFSv4.1 protocol provides a root filehandle that clients can use
can use to obtain filehandles for the exports of a particular server, to obtain filehandles for the exports of a particular server, via a
via a series of LOOKUP operations within a COMPOUND, to traverse a series of LOOKUP operations within a COMPOUND, to traverse a path. A
path. A common user experience is to use a graphical user interface common user experience is to use a graphical user interface (perhaps
(perhaps a file "Open" dialog window) to find a file via progressive a file "Open" dialog window) to find a file via progressive browsing
browsing through a directory tree. The client must be able to move through a directory tree. The client must be able to move from one
from one export to another export via single-component, progressive export to another export via single-component, progressive LOOKUP
LOOKUP operations. operations.
This style of browsing is not well supported by the NFS version 2 and This style of browsing is not well supported by the NFSv3 protocol.
3 protocols. In these versions of NFS, the client expects all LOOKUP In NFSv3, the client expects all LOOKUP operations to remain within a
operations to remain within a single server file system. For single server file system. For example, the device attribute will
example, the device attribute will not change. This prevents a not change. This prevents a client from taking namespace paths that
client from taking namespace paths that span exports. span exports.
In the case of Versions 2 and 3, an automounter on the client can In the case of NFSv3, an automounter on the client can obtain a
obtain a snapshot of the server's namespace using the EXPORTS snapshot of the server's namespace using the EXPORTS procedure of the
procedure of the MOUNT protocol. If it understands the server's MOUNT protocol. If it understands the server's pathname syntax, it
pathname syntax, it can create an image of the server's namespace on can create an image of the server's namespace on the client. The
the client. The parts of the namespace that are not exported by the parts of the namespace that are not exported by the server are filled
server are filled in with directories that might be arrange similarly in with directories that might be constructed similarly to a NFSv4.1
to a version 4 "pseudo file system" that allows the user to browse "pseudo file system" (see Section 7.3) that allows the user to browse
from one mounted file system to another. There is a drawback to this from one mounted file system to another. There is a drawback to this
representation of the server's namespace on the client: it is static. representation of the server's namespace on the client: it is static.
If the server administrator adds a new export the client will be If the server administrator adds a new export the client will be
unaware of it. unaware of it.
7.3. Server Pseudo File System 7.3. Server Pseudo File System
NFS version 4 servers avoid this namespace inconsistency by NFSv4.1 servers avoid this namespace inconsistency by presenting all
presenting all the exports for a given server within the framework of the exports for a given server within the framework of a single
a single namespace, for that server. An NFS version 4 client uses namespace, for that server. An NFSv4.1 client uses LOOKUP and
LOOKUP and READDIR operations to browse seamlessly from one export to READDIR operations to browse seamlessly from one export to another.
another.
Where there are portions of the server namespace that are not Where there are portions of the server namespace that are not
exported, clients require some way of traversing those portions to exported, clients require some way of traversing those portions to
reach actual exported file systems. A technique that servers may use reach actual exported file systems. A technique that servers may use
to provide for this is to bridge unexported portion of the namespace to provide for this is to bridge unexported portion of the namespace
via a "pseudo file system" that provides a view of exported via a "pseudo file system" that provides a view of exported
directories only. A pseudo file system has a unique fsid and behaves directories only. A pseudo file system has a unique fsid and behaves
like a normal, read-only file system. like a normal, read-only file system.
Based on the construction of the server's namespace, it is possible Based on the construction of the server's namespace, it is possible
skipping to change at page 138, line 32 skipping to change at page 142, line 22
/a/b/c/d real file system /a/b/c/d real file system
Each of the pseudo file systems is considered a separate entity and Each of the pseudo file systems is considered a separate entity and
therefore MUST have its own fsid, unique among all the fsids for that therefore MUST have its own fsid, unique among all the fsids for that
server. server.
7.4. Multiple Roots 7.4. Multiple Roots
Certain operating environments are sometimes described as having Certain operating environments are sometimes described as having
"multiple roots". In such environments individual file systems are "multiple roots". In such environments individual file systems are
commonly represented by disk or volume names. NFS version 4 servers commonly represented by disk or volume names. NFSv4 servers for
for these platforms can construct a pseudo file system above these these platforms can construct a pseudo file system above these root
root names so that disk letters or volume names are simply directory names so that disk letters or volume names are simply directory names
names in the pseudo root. in the pseudo root.
7.5. Filehandle Volatility 7.5. Filehandle Volatility
The nature of the server's pseudo file system is that it is a logical The nature of the server's pseudo file system is that it is a logical
representation of file system(s) available from the server. representation of file system(s) available from the server.
Therefore, the pseudo file system is most likely constructed Therefore, the pseudo file system is most likely constructed
dynamically when the server is first instantiated. It is expected dynamically when the server is first instantiated. It is expected
that the pseudo file system may not have an on disk counterpart from that the pseudo file system may not have an on disk counterpart from
which persistent filehandles could be constructed. Even though it is which persistent filehandles could be constructed. Even though it is
preferable that the server provide persistent filehandles for the preferable that the server provide persistent filehandles for the
skipping to change at page 143, line 7 skipping to change at page 146, line 43
With the exception of special stateids, to be discussed later, each With the exception of special stateids, to be discussed later, each
stateid represents locking objects of one of a set of types defined stateid represents locking objects of one of a set of types defined
by the NFSv4.1 protocol. Note that in all these cases, where we by the NFSv4.1 protocol. Note that in all these cases, where we
speak of guarantee, there is always an implied codicil that any speak of guarantee, there is always an implied codicil that any
situation such as a client reboot, or lock revocation, allows the situation such as a client reboot, or lock revocation, allows the
guarantee to be voided. guarantee to be voided.
o Stateids may represent opens of files. o Stateids may represent opens of files.
o Stateids may represent sets of byte-range locks held on a Each stateid in this case represents the open for a given
particular file by a particular owner and all gotten under the clientid/openowner/filehandle triple. Such stateids are subject
aegis of a particular open file. to change (with consequent bumping of the seqid) in response to
OPENs that result in upgrade and OPEN_DOWNGRADE operations.
o Stateids may represent sets of byte-range locks.
All locks held on a particular file by a particular owner and all
gotten under the aegis of a particular open file are associated
with a single stateid with the seqid being bumped as LOCK and
LOCKU operation affect that set of locks.
o Stateids may represent file delegations, which are recallable o Stateids may represent file delegations, which are recallable
guarantees by the server to the client, that other clients will guarantees by the server to the client, that other clients will
not reference, or will not modify a particular file, until the not reference, or will not modify a particular file, until the
delegation is returned. In NFSv4.1, file delegations may be delegation is returned. In NFSv4.1, file delegations may be
obtained on both regular and non-regular files. obtained on both regular and non-regular files.
A stateid represents a single delegation held by a client for a
particular filehandle.
o Stateids may represent directory delegations, which are recallable o Stateids may represent directory delegations, which are recallable
guarantees by the server to the client, that other clients will guarantees by the server to the client, that other clients will
not modify the directory, until the delegation is returned. not modify the directory, until the delegation is returned.
A stateid represents a single delegation held by a client for a
particular directory filehandle.
o Stateids may represent layouts, which are recallable guarantees by o Stateids may represent layouts, which are recallable guarantees by
the server to the client, that particular files may be accessed the server to the client, that particular files may be accessed
via an alternate data access protocol at specific locations. Such via an alternate data access protocol at specific locations. Such
access is limited to particular sets of byte ranges and may access is limited to particular sets of byte ranges and may
proceed until those byte ranges are reduced or the layout is proceed until those byte ranges are reduced or the layout is
returned. returned.
o Stateids may represent device maps, which are recallable A stateid represents all layout held by a particular client for a
guarantees by the server to the client, that the devices particular filehandle with a given layout type. The seqid is
designated device id's in layouts will not be changed while these updated as the contents of that set changes with LAYOUT
device are still held by the client.
8.2.2. Stateid Structure 8.2.2. Stateid Structure
Stateids are divided into two fields, a 96-bit "other" field Stateids are divided into two fields, a 96-bit "other" field
identifying the specific set of locks and a 32-bit "seqid" sequence identifying the specific set of locks and a 32-bit "seqid" sequence
value. Except in the case of special stateids, to be discussed value. Except in the case of special stateids, to be discussed
below, a particular value of the "other" field denotes a set of loc