draft-ietf-nfsv4-minorversion1-13.txt   draft-ietf-nfsv4-minorversion1-14.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: January 2, 2008 Editors Expires: March 28, 2008 Editors
September 25, 2007
NFSv4 Minor Version 1 NFSv4 Minor Version 1
draft-ietf-nfsv4-minorversion1-13.txt draft-ietf-nfsv4-minorversion1-14.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 33 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 2, 2008. This Internet-Draft will expire on March 28, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2007). Copyright (C) The IETF Trust (2007).
Abstract Abstract
This Internet-Draft describes NFSv4 minor version one, including This Internet-Draft describes NFSv4 minor version one, including
features retained from the base protocol and protocol extensions made features retained from the base protocol and protocol extensions made
subsequently. The current draft includes description of the major subsequently. The current draft includes description of the major
skipping to change at page 2, line 27 skipping to change at page 2, line 27
1.2. NFS Version 4 Goals . . . . . . . . . . . . . . . . . . 10 1.2. NFS Version 4 Goals . . . . . . . . . . . . . . . . . . 10
1.3. Minor Version 1 Goals . . . . . . . . . . . . . . . . . 11 1.3. Minor Version 1 Goals . . . . . . . . . . . . . . . . . 11
1.4. Overview of NFS version 4.1 Features . . . . . . . . . . 11 1.4. Overview of NFS version 4.1 Features . . . . . . . . . . 11
1.4.1. RPC and Security . . . . . . . . . . . . . . . . . . 12 1.4.1. RPC and Security . . . . . . . . . . . . . . . . . . 12
1.4.2. Protocol Structure . . . . . . . . . . . . . . . . . 12 1.4.2. Protocol Structure . . . . . . . . . . . . . . . . . 12
1.4.3. File System Model . . . . . . . . . . . . . . . . . 13 1.4.3. File System Model . . . . . . . . . . . . . . . . . 13
1.4.4. Locking Facilities . . . . . . . . . . . . . . . . . 14 1.4.4. Locking Facilities . . . . . . . . . . . . . . . . . 14
1.5. General Definitions . . . . . . . . . . . . . . . . . . 15 1.5. General Definitions . . . . . . . . . . . . . . . . . . 15
1.6. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 17 1.6. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 17
2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 17 2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 17
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 18 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 17
2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 18 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 18 2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 18
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 21 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 21
2.4. Client Identifiers and Client Owners . . . . . . . . . . 22 2.4. Client Identifiers and Client Owners . . . . . . . . . . 22
2.4.1. Server Release of Client ID . . . . . . . . . . . . 26 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 25
2.4.2. Resolving Client Owner Conflicts . . . . . . . . . . 26 2.4.2. Server Release of Client ID . . . . . . . . . . . . 26
2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 26
2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 27 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 27
2.6. Security Service Negotiation . . . . . . . . . . . . . . 28 2.6. Security Service Negotiation . . . . . . . . . . . . . . 28
2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 28 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 28
2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 28 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 28
2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 29 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 29
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 32 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 32
2.8. Non-RPC-based Security Services . . . . . . . . . . . . 34 2.8. Non-RPC-based Security Services . . . . . . . . . . . . 34
2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 34 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 34
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 35 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 35
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 35 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 35
skipping to change at page 3, line 9 skipping to change at page 3, line 10
2.9.2. Client and Server Transport Behavior . . . . . . . . 36 2.9.2. Client and Server Transport Behavior . . . . . . . . 36
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 37 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 37
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 37 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 37
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 37 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 37
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 38 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 38
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 40 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 40
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 41 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 41
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 44 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 44
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 56 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 56
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 59 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 59
2.10.8. Session Mechanics - Steady State . . . . . . . . . . 67 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 64
2.10.9. Session Mechanics - Recovery . . . . . . . . . . . . 69 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 68
2.10.10. Parallel NFS and Sessions . . . . . . . . . . . . . 72 2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 69
3. Protocol Data Types . . . . . . . . . . . . . . . . . . . . . 72 2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 73
3.1. Basic Data Types . . . . . . . . . . . . . . . . . . . . 72 3. Protocol Data Types . . . . . . . . . . . . . . . . . . . . . 73
3.2. Structured Data Types . . . . . . . . . . . . . . . . . 74 3.1. Basic Data Types . . . . . . . . . . . . . . . . . . . . 73
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.2. Structured Data Types . . . . . . . . . . . . . . . . . 75
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 84 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 84 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 85
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 85 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 85
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 85 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 86
4.2.1. General Properties of a Filehandle . . . . . . . . . 85 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 86
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 86 4.2.1. General Properties of a Filehandle . . . . . . . . . 86
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 86 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 87
4.3. One Method of Constructing a Volatile Filehandle . . . . 88 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 87
4.4. Client Recovery from Filehandle Expiration . . . . . . . 88 4.3. One Method of Constructing a Volatile Filehandle . . . . 89
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 89 4.4. Client Recovery from Filehandle Expiration . . . . . . . 89
5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 90 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 90
5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 91 5.1. Mandatory Attributes . . . . . . . . . . . . . . . . . . 91
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 91 5.2. Recommended Attributes . . . . . . . . . . . . . . . . . 92
5.4. Classification of Attributes . . . . . . . . . . . . . . 92 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 92
5.5. Mandatory Attributes - Definitions . . . . . . . . . . . 93 5.4. Classification of Attributes . . . . . . . . . . . . . . 93
5.6. Recommended Attributes - Definitions . . . . . . . . . . 94 5.5. Mandatory Attributes - Definitions . . . . . . . . . . . 94
5.7. Time Access . . . . . . . . . . . . . . . . . . . . . . 104 5.6. Recommended Attributes - Definitions . . . . . . . . . . 95
5.8. Interpreting owner and owner_group . . . . . . . . . . . 105 5.7. Time Access . . . . . . . . . . . . . . . . . . . . . . 106
5.9. Character Case Attributes . . . . . . . . . . . . . . . 107 5.8. Interpreting owner and owner_group . . . . . . . . . . . 107
5.10. Quota Attributes . . . . . . . . . . . . . . . . . . . . 107 5.9. Character Case Attributes . . . . . . . . . . . . . . . 109
5.11. mounted_on_fileid . . . . . . . . . . . . . . . . . . . 108 5.10. Quota Attributes . . . . . . . . . . . . . . . . . . . . 109
5.12. Directory Notification Attributes . . . . . . . . . . . 109 5.11. mounted_on_fileid . . . . . . . . . . . . . . . . . . . 110
5.12.1. dir_notif_delay . . . . . . . . . . . . . . . . . . 109 5.12. Directory Notification Attributes . . . . . . . . . . . 111
5.12.2. dirent_notif_delay . . . . . . . . . . . . . . . . . 109 5.12.1. dir_notif_delay . . . . . . . . . . . . . . . . . . 111
5.13. PNFS Attributes . . . . . . . . . . . . . . . . . . . . 109 5.12.2. dirent_notif_delay . . . . . . . . . . . . . . . . . 111
5.13.1. fs_layout_type . . . . . . . . . . . . . . . . . . . 109 5.13. PNFS Attributes . . . . . . . . . . . . . . . . . . . . 111
5.13.2. layout_alignment . . . . . . . . . . . . . . . . . . 109 5.13.1. fs_layout_type . . . . . . . . . . . . . . . . . . . 111
5.13.3. layout_blksize . . . . . . . . . . . . . . . . . . . 110 5.13.2. layout_alignment . . . . . . . . . . . . . . . . . . 111
5.13.4. layout_hint . . . . . . . . . . . . . . . . . . . . 110 5.13.3. layout_blksize . . . . . . . . . . . . . . . . . . . 112
5.13.5. layout_type . . . . . . . . . . . . . . . . . . . . 110 5.13.4. layout_hint . . . . . . . . . . . . . . . . . . . . 112
5.13.6. mdsthreshold . . . . . . . . . . . . . . . . . . . . 110 5.13.5. layout_type . . . . . . . . . . . . . . . . . . . . 112
5.14. Retention Attributes . . . . . . . . . . . . . . . . . . 111 5.13.6. mdsthreshold . . . . . . . . . . . . . . . . . . . . 112
6. Security Related Attributes . . . . . . . . . . . . . . . . . 113 5.14. Retention Attributes . . . . . . . . . . . . . . . . . . 113
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 113 6. Security Related Attributes . . . . . . . . . . . . . . . . . 115
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 114 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2.1. ACL Attributes . . . . . . . . . . . . . . . . . . . 114 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 116
6.2.2. dacl and sacl Attributes . . . . . . . . . . . . . . 127 6.2.1. ACL Attributes . . . . . . . . . . . . . . . . . . . 116
6.2.3. mode Attribute . . . . . . . . . . . . . . . . . . . 127 6.2.2. dacl and sacl Attributes . . . . . . . . . . . . . . 129
6.2.4. mode_set_masked Attribute . . . . . . . . . . . . . 128 6.2.3. mode Attribute . . . . . . . . . . . . . . . . . . . 129
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 129 6.2.4. mode_set_masked Attribute . . . . . . . . . . . . . 130
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 129 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 131
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 130 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 131
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 131 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 132
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 132 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 133
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 133 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 134
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 134 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 135
7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 138 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 136
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 138 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 140
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 138 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 140
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 139 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 140
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 139 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 141
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 140 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 141
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 140 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 142
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 140 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 142
7.8. Security Policy and Namespace Presentation . . . . . . . 141 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 142
8. State Management . . . . . . . . . . . . . . . . . . . . . . 142 7.8. Security Policy and Namespace Presentation . . . . . . . 143
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 142 8. State Management . . . . . . . . . . . . . . . . . . . . . . 144
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 143 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 145
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 143 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 145
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 144 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 145
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 145 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 146
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 146 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 147
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 148 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 148
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 149 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 150
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 149 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 151
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 150 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 151
8.4.3. Network Partitions and Recovery . . . . . . . . . . 154 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 152
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 158 8.4.3. Network Partitions and Recovery . . . . . . . . . . 156
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 159 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 160
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 161
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 159 Expiration . . . . . . . . . . . . . . . . . . . . . . . 162
8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 160 8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 162
9. File Locking and Share Reservations . . . . . . . . . . . . . 161 9. File Locking and Share Reservations . . . . . . . . . . . . . 163
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 161 9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 163
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 161 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 164
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 162 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 164
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 165 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 167
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 165 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 167
9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 166 9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 168
9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 167 9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 169
9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 167 9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 169
9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 168 9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 170
9.8. Reclaim of Open and Byte-range Locks . . . . . . . . . . 169 9.8. Reclaim of Open and Byte-range Locks . . . . . . . . . . 171
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 169 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 171
10.1. Performance Challenges for Client-Side Caching . . . . . 170 10.1. Performance Challenges for Client-Side Caching . . . . . 172
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 171 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 173
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 172 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 174
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 174 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 177
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 175 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 177
10.3.2. Data Caching and File Locking . . . . . . . . . . . 176 10.3.2. Data Caching and File Locking . . . . . . . . . . . 178
10.3.3. Data Caching and Mandatory File Locking . . . . . . 177 10.3.3. Data Caching and Mandatory File Locking . . . . . . 180
10.3.4. Data Caching and File Identity . . . . . . . . . . . 178 10.3.4. Data Caching and File Identity . . . . . . . . . . . 180
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 179 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 181
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 181 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 183
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 182 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 185
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 183 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 185
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 186 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 188
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 188 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 190
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 189 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 191
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 189 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 191
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 189 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 192
10.5.1. Revocation Recovery for Write Open Delegation . . . 190 10.5.1. Revocation Recovery for Write Open Delegation . . . 192
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 191 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 193
10.7. Data and Metadata Caching and Memory Mapped Files . . . 193 10.7. Data and Metadata Caching and Memory Mapped Files . . . 195
10.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 195 10.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 197
10.9. Directory Caching . . . . . . . . . . . . . . . . . . . 196 10.9. Directory Caching . . . . . . . . . . . . . . . . . . . 198
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 197 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 199
11.1. Location attributes . . . . . . . . . . . . . . . . . . 197 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 199
11.2. File System Presence or Absence . . . . . . . . . . . . 197 11.2. File System Presence or Absence . . . . . . . . . . . . 200
11.3. Getting Attributes for an Absent File System . . . . . . 199 11.3. Getting Attributes for an Absent File System . . . . . . 201
11.3.1. GETATTR Within an Absent File System . . . . . . . . 199 11.3.1. GETATTR Within an Absent File System . . . . . . . . 201
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 200 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 202
11.4. Uses of Location Information . . . . . . . . . . . . . . 201 11.4. Uses of Location Information . . . . . . . . . . . . . . 203
11.4.1. File System Replication . . . . . . . . . . . . . . 201 11.4.1. File System Replication . . . . . . . . . . . . . . 204
11.4.2. File System Migration . . . . . . . . . . . . . . . 203 11.4.2. File System Migration . . . . . . . . . . . . . . . 205
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 204 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 206
11.5. Additional Client-side Considerations . . . . . . . . . 205 11.5. Additional Client-side Considerations . . . . . . . . . 207
11.6. Effecting File System Transitions . . . . . . . . . . . 206 11.6. Effecting File System Transitions . . . . . . . . . . . 208
11.6.1. File System Transitions and Simultaneous Access . . 207 11.6.1. File System Transitions and Simultaneous Access . . 209
11.6.2. Simultaneous Use and Transparent Transitions . . . . 208 11.6.2. Simultaneous Use and Transparent Transitions . . . . 210
11.6.3. Filehandles and File System Transitions . . . . . . 210 11.6.3. Filehandles and File System Transitions . . . . . . 212
11.6.4. Fileid's and File System Transitions . . . . . . . . 210 11.6.4. Fileids and File System Transitions . . . . . . . . 213
11.6.5. Fsids and File System Transitions . . . . . . . . . 211 11.6.5. Fsids and File System Transitions . . . . . . . . . 214
11.6.6. The Change Attribute and File System Transitions . . 211 11.6.6. The Change Attribute and File System Transitions . . 215
11.6.7. Lock State and File System Transitions . . . . . . . 212 11.6.7. Lock State and File System Transitions . . . . . . . 215
11.6.8. Write Verifiers and File System Transitions . . . . 216 11.6.8. Write Verifiers and File System Transitions . . . . 219
11.7. Effecting File System Referrals . . . . . . . . . . . . 216 11.6.9. Readdir Cookies and Verifiers and File System
11.7.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 216 Transitions . . . . . . . . . . . . . . . . . . . . 219
11.7.2. Referral Example (READDIR) . . . . . . . . . . . . . 220 11.6.10. File System Data and File System Transitions . . . . 220
11.8. The Attribute fs_absent . . . . . . . . . . . . . . . . 223 11.7. Effecting File System Referrals . . . . . . . . . . . . 221
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 223 11.7.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 222
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 225 11.7.2. Referral Example (READDIR) . . . . . . . . . . . . . 225
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 228 11.8. The Attribute fs_locations . . . . . . . . . . . . . . . 228
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 233 11.9. The Attribute fs_locations_info . . . . . . . . . . . . 230
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 234 11.9.1. The fs_locations_server4 Structure . . . . . . . . . 233
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 235 11.9.2. The fs_locations_info4 Structure . . . . . . . . . . 239
12. Directory Delegations . . . . . . . . . . . . . . . . . . . . 239 11.9.3. The fs_locations_item4 Structure . . . . . . . . . . 240
12.1. Introduction to Directory Delegations . . . . . . . . . 239 11.10. The Attribute fs_status . . . . . . . . . . . . . . . . 242
12.2. Directory Delegation Design . . . . . . . . . . . . . . 240 12. Directory Delegations . . . . . . . . . . . . . . . . . . . . 245
12.3. Attributes in Support of Directory Notifications . . . . 241 12.1. Introduction to Directory Delegations . . . . . . . . . 245
12.4. Delegation Recall . . . . . . . . . . . . . . . . . . . 241 12.2. Directory Delegation Design . . . . . . . . . . . . . . 246
12.5. Directory Delegation Recovery . . . . . . . . . . . . . 241 12.3. Attributes in Support of Directory Notifications . . . . 247
13. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 241 12.4. Delegation Recall . . . . . . . . . . . . . . . . . . . 247
13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 241 12.5. Directory Delegation Recovery . . . . . . . . . . . . . 248
13.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 243 13. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 248
13.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 243 13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 248
13.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 243 13.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 250
13.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 244 13.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 250
13.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 244 13.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 250
13.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 244 13.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 251
13.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 244 13.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 251
13.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 244 13.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 251
13.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 245 13.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 251
13.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 245 13.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 251
13.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 246 13.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 252
13.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 246 13.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 252
13.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 247 13.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 253
13.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 247 13.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 253
13.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 248 13.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 254
13.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 249 13.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 254
13.5.3. Committing a Layout . . . . . . . . . . . . . . . . 250 13.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 254
13.5.4. Recalling a Layout . . . . . . . . . . . . . . . . . 253 13.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 256
13.5.5. Metadata Server Write Propagation . . . . . . . . . 259 13.5.3. Committing a Layout . . . . . . . . . . . . . . . . 257
13.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 259 13.5.4. Recalling a Layout . . . . . . . . . . . . . . . . . 259
13.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 260 13.5.5. Metadata Server Write Propagation . . . . . . . . . 265
13.7.1. Client Recovery . . . . . . . . . . . . . . . . . . 261 13.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 266
13.7.2. Dealing with Lease Expiration on the Client . . . . 261 13.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 267
13.7.1. Client Recovery . . . . . . . . . . . . . . . . . . 267
13.7.2. Dealing with Lease Expiration on the Client . . . . 268
13.7.3. Dealing with Loss of Layout State on the Metadata 13.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 263 Server . . . . . . . . . . . . . . . . . . . . . . . 269
13.7.4. Recovery from Metadata Server Restart . . . . . . . 263 13.7.4. Recovery from Metadata Server Restart . . . . . . . 270
13.7.5. Operations During Metadata Server Grace Period . . . 265 13.7.5. Operations During Metadata Server Grace Period . . . 272
13.7.6. Storage Device Recovery . . . . . . . . . . . . . . 266 13.7.6. Storage Device Recovery . . . . . . . . . . . . . . 272
13.8. Metadata and Storage Device Roles . . . . . . . . . . . 266 13.8. Metadata and Storage Device Roles . . . . . . . . . . . 273
13.9. Security Considerations . . . . . . . . . . . . . . . . 268 13.9. Security Considerations . . . . . . . . . . . . . . . . 274
14. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 269 14. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 275
14.1. Client ID and Session Considerations . . . . . . . . . . 269 14.1. Client ID and Session Considerations . . . . . . . . . . 275
14.2. File Layout Definitions . . . . . . . . . . . . . . . . 270 14.2. File Layout Definitions . . . . . . . . . . . . . . . . 277
14.3. File Layout Data Types . . . . . . . . . . . . . . . . . 271 14.3. File Layout Data Types . . . . . . . . . . . . . . . . . 278
14.4. Interpreting the File Layout . . . . . . . . . . . . . . 274 14.4. Interpreting the File Layout . . . . . . . . . . . . . . 280
14.5. Sparse and Dense Stripe Unit Packing . . . . . . . . . . 276 14.5. Sparse and Dense Stripe Unit Packing . . . . . . . . . . 283
14.6. Data Server Multipathing . . . . . . . . . . . . . . . . 277 14.6. Data Server Multipathing . . . . . . . . . . . . . . . . 284
14.7. Operations Issued to NFSv4.1 Data Servers . . . . . . . 278 14.7. Operations Issued to NFSv4.1 Data Servers . . . . . . . 285
14.8. COMMIT Through Metadata Server . . . . . . . . . . . . . 279 14.8. COMMIT Through Metadata Server . . . . . . . . . . . . . 285
14.9. The Layout Iomode . . . . . . . . . . . . . . . . . . . 280 14.9. The Layout Iomode . . . . . . . . . . . . . . . . . . . 286
14.10. Metadata and Data Server State Coordination . . . . . . 280 14.10. Metadata and Data Server State Coordination . . . . . . 287
14.10.1. Global Stateid Requirements . . . . . . . . . . . . 280 14.10.1. Global Stateid Requirements . . . . . . . . . . . . 287
14.10.2. Data Server State Propagation . . . . . . . . . . . 280 14.10.2. Data Server State Propagation . . . . . . . . . . . 287
14.11. Data Server Component File Size . . . . . . . . . . . . 283 14.11. Data Server Component File Size . . . . . . . . . . . . 289
14.12. Recovery from Loss of Layout . . . . . . . . . . . . . . 283 14.12. Recovery from Loss of Layout . . . . . . . . . . . . . . 290
14.13. Security Considerations for the File Layout Type . . . . 284 14.13. Security Considerations for the File Layout Type . . . . 291
15. Internationalization . . . . . . . . . . . . . . . . . . . . 284 15. Internationalization . . . . . . . . . . . . . . . . . . . . 291
15.1. Stringprep profile for the utf8str_cs type . . . . . . . 286 15.1. Stringprep profile for the utf8str_cs type . . . . . . . 292
15.2. Stringprep profile for the utf8str_cis type . . . . . . 287 15.2. Stringprep profile for the utf8str_cis type . . . . . . 294
15.3. Stringprep profile for the utf8str_mixed type . . . . . 289 15.3. Stringprep profile for the utf8str_mixed type . . . . . 295
15.4. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 290 15.4. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 297
16. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 290 16. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 297
16.1. Error Definitions . . . . . . . . . . . . . . . . . . . 291 16.1. Error Definitions . . . . . . . . . . . . . . . . . . . 297
16.2. Operations and their valid errors . . . . . . . . . . . 305 16.2. Operations and their valid errors . . . . . . . . . . . 312
16.3. Callback operations and their valid errors . . . . . . . 319 16.3. Callback operations and their valid errors . . . . . . . 326
16.4. Errors and the operations that use them . . . . . . . . 320 16.4. Errors and the operations that use them . . . . . . . . 327
17. NFS version 4.1 Procedures . . . . . . . . . . . . . . . . . 327 17. NFS version 4.1 Procedures . . . . . . . . . . . . . . . . . 334
17.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 327 17.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 335
17.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 328 17.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 335
18. NFS version 4.1 Operations . . . . . . . . . . . . . . . . . 333 18. NFS version 4.1 Operations . . . . . . . . . . . . . . . . . 340
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 333 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 340
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 335 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 342
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 337 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 344
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 339 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 346
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 342 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 349
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 343 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 350
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 343 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 350
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 345 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 352
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 346 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 353
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 347 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 354
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 351 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 358
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 352 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 359
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 354 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 361
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 356 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 363
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 357 Attributes . . . . . . . . . . . . . . . . . . . . . . . 364
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 358 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 365
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 373 Directory . . . . . . . . . . . . . . . . . . . . . . . 380
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 374 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 381
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 375 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 383
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 376 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 383
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 378 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 385
18.22. Operation 25: READ - Read from File . . . . . . . . . . 379 18.22. Operation 25: READ - Read from File . . . . . . . . . . 386
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 381 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 388
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 385 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 392
18.25. Operation 28: REMOVE - Remove File System Object . . . . 386 18.25. Operation 28: REMOVE - Remove File System Object . . . . 393
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 388 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 395
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 390 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 397
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 391 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 398
18.29. Operation 33: SECINFO - Obtain Available Security . . . 391 18.29. Operation 33: SECINFO - Obtain Available Security . . . 398
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 395 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 402
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 397 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 404
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 398 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 405
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 403 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 410
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 404 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 411
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 406 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 413
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 423 Confirm Client ID . . . . . . . . . . . . . . . . . . . 430
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 433 session . . . . . . . . . . . . . . . . . . . . . . . . 440
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 435 locks . . . . . . . . . . . . . . . . . . . . . . . . . 442
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 436 delegation . . . . . . . . . . . . . . . . . . . . . . . 443
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 440 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 447
18.41. Operation 48: GETDEVICELIST . . . . . . . . . . . . . . 441 18.41. Operation 48: GETDEVICELIST . . . . . . . . . . . . . . 448
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 442 a layout . . . . . . . . . . . . . . . . . . . . . . . . 449
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 445 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 452
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 448 Information . . . . . . . . . . . . . . . . . . . . . . 455
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 451 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 458
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 452 sequencing and control . . . . . . . . . . . . . . . . . 459
18.47. Operation 54: SET_SSV . . . . . . . . . . . . . . . . . 459 18.47. Operation 54: SET_SSV . . . . . . . . . . . . . . . . . 466
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 461 validity . . . . . . . . . . . . . . . . . . . . . . . . 468
18.49. Operation 56: WANT_DELEGATION . . . . . . . . . . . . . 462 18.49. Operation 56: WANT_DELEGATION . . . . . . . . . . . . . 470
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 465 client ID . . . . . . . . . . . . . . . . . . . . . . . 472
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 466 Finished . . . . . . . . . . . . . . . . . . . . . . . . 473
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 468 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 475
19. NFS version 4.1 Callback Procedures . . . . . . . . . . . . . 468 19. NFS version 4.1 Callback Procedures . . . . . . . . . . . . . 476
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 469 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 476
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 469 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 476
20. NFS version 4.1 Callback Operations . . . . . . . . . . . . . 471 20. NFS version 4.1 Callback Operations . . . . . . . . . . . . . 478
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 471 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 478
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 473 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 480
20.3. Operation 5: CB_LAYOUTRECALL . . . . . . . . . . . . . . 474 20.3. Operation 5: CB_LAYOUTRECALL . . . . . . . . . . . . . . 481
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 477 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 484
20.5. Operation 7: CB_PUSH_DELEG . . . . . . . . . . . . . . . 480 20.5. Operation 7: CB_PUSH_DELEG . . . . . . . . . . . . . . . 487
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 481 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 488
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL . . . . . . . . . . 484 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL . . . . . . . . . . 491
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 485 limits . . . . . . . . . . . . . . . . . . . . . . . . . 492
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 486 sequencing and control . . . . . . . . . . . . . . . . . 493
20.10. Operation 12: CB_WANTS_CANCELLED . . . . . . . . . . . . 489 20.10. Operation 12: CB_WANTS_CANCELLED . . . . . . . . . . . . 496
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 490 lock availability . . . . . . . . . . . . . . . . . . . 497
20.12. Operation 10044: CB_ILLEGAL - Illegal Callback 20.12. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 491 Operation . . . . . . . . . . . . . . . . . . . . . . . 498
21. Security Considerations . . . . . . . . . . . . . . . . . . . 492 21. Security Considerations . . . . . . . . . . . . . . . . . . . 499
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 492 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 499
22.1. Defining new layout types . . . . . . . . . . . . . . . 492 22.1. Defining new layout types . . . . . . . . . . . . . . . 499
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 493 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 500
23.1. Normative References . . . . . . . . . . . . . . . . . . 493 23.1. Normative References . . . . . . . . . . . . . . . . . . 500
23.2. Informative References . . . . . . . . . . . . . . . . . 494 23.2. Informative References . . . . . . . . . . . . . . . . . 502
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 496 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 503
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 497 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 504
Intellectual Property and Copyright Statements . . . . . . . . . 498 Intellectual Property and Copyright Statements . . . . . . . . . 506
1. Introduction 1. Introduction
1.1. The NFSv4.1 Protocol 1.1. The NFSv4.1 Protocol
The NFSv4.1 protocol is a minor version of the NFSv4 protocol The NFSv4.1 protocol is a minor version of the NFSv4 protocol
described in [2]. It generally follows the guidelines for minor described in [2]. It generally follows the guidelines for minor
versioning model laid in Section 10 of RFC 3530. However, it versioning model laid in Section 10 of RFC 3530. However, it
diverges from guidelines 11 ("a client and server that supports minor diverges from guidelines 11 ("a client and server that supports minor
version X must support minor versions 0 through X-1"), and 12 ("no version X must support minor versions 0 through X-1"), and 12 ("no
skipping to change at page 15, line 22 skipping to change at page 15, line 22
the holder that inconsistent directory modifications cannot occur the holder that inconsistent directory modifications cannot occur
so long as the delegation is held. so long as the delegation is held.
o Layouts which are recallable objects that assure the holder that o Layouts which are recallable objects that assure the holder that
direct access to the file data may be performed directly by the direct access to the file data may be performed directly by the
client and that no change to the data's location inconsistent with client and that no change to the data's location inconsistent with
that access may be made so long as the layout is held. that access may be made so long as the layout is held.
All locks for a given client are tied together under a single client- All locks for a given client are tied together under a single client-
wide lease. All requests made on sessions associated with the client wide lease. All requests made on sessions associated with the client
renew that lease. When leases are not promptly renewed lock are renew that lease. When leases are not promptly renewed locks are
subject to revocation. In the event of server reinitialization, subject to revocation. In the event of server re-initialization,
clients have the opportunity to safely reclaim their locks within a clients have the opportunity to safely reclaim their locks within a
special grace period. special grace period.
1.5. General Definitions 1.5. General Definitions
The following definitions are provided for the purpose of providing The following definitions are provided for the purpose of providing
an appropriate context for the reader. an appropriate context for the reader.
Client The "client" is the entity that accesses the NFS server's Client The "client" is the entity that accesses the NFS server's
resources. The client may be an application which contains the resources. The client may be an application which contains the
logic to access the NFS server directly. The client may also be logic to access the NFS server directly. The client may also be
the traditional operating system client remote file system the traditional operating system client remote file system
services for a set of applications. services for a set of applications.
A client is uniquely identified by a Client Owner. A client is uniquely identified by a Client Owner.
In the case of file locking the client is the entity that With reference to file locking, the client is also the entity that
maintains a set of locks on behalf of one or more applications. maintains a set of locks on behalf of one or more applications.
This client is responsible for crash or failure recovery for those This client is responsible for crash or failure recovery for those
locks it manages. locks it manages.
Note that multiple clients may share the same transport and Note that multiple clients may share the same transport and
connection and multiple clients may exist on the same network connection and multiple clients may exist on the same network
node. node.
Client ID A 64-bit quantity used as a unique, short-hand reference Client ID A 64-bit quantity used as a unique, short-hand reference
to a client supplied Verifier and client owner. The server is to a client supplied Verifier and client owner. The server is
responsible for supplying the client ID. responsible for supplying the client ID.
Client Owner The client owner is a unique string, opaque to the Client Owner The client owner is a unique string, opaque to the
server, which identifies a client. Multiple network connections server, which identifies a client. Multiple network connections
and source network addresses originating those connections may and source network addresses originating from those connections
share a client owner. The server is expected to treat requests may share a client owner. The server is expected to treat
from connnections with the same client owner has coming from the requests from connnections with the same client owner as coming
same client. from the same client.
Lease An interval of time defined by the server for which the client Lease An interval of time defined by the server for which the client
is irrevocably granted a lock. At the end of a lease period the is irrevocably granted a lock. At the end of a lease period the
lock may be revoked if the lease has not been extended. The lock lock may be revoked if the lease has not been extended. The lock
must be revoked if a conflicting lock has been granted after the must be revoked if a conflicting lock has been granted after the
lease interval. lease interval.
All leases granted by a server have the same fixed interval. Note All leases granted by a server have the same fixed interval. Note
that the fixed interval was chosen to alleviate the expense a that the fixed interval was chosen to alleviate the expense a
server would have in maintaining state about variable length server would have in maintaining state about variable length
leases across server failures. leases across server failures.
Lock The term "lock" is used to refer to any of record (octet-range) Lock The term "lock" is used to refer to record (octet-range) locks,
locks, share reservations, delegations or layouts unless share reservations, delegations or layouts unless specifically
specifically stated otherwise. stated otherwise.
Server The "Server" is the entity responsible for coordinating Server The "Server" is the entity responsible for coordinating
client access to a set of file systems. A server can span client access to a set of file systems and is identified by a
multiple network addresses. In NFSv4.1, a server is a two tiered Server owner. A server can span multiple network addresses.
entity allows for servers consisting of multiple components the
flexibility to tightly or loosely couple their components without
requiring tight synchronization among the components. Every
server has a "Server Owner" which reflects the two tiers of a
server entity.
Server Owner The "Server Owner" identifies the server to the client. Server Owner The "Server Owner" identifies the server to the client.
The server owner consists of a major and minor identifier. When The server owner consists of a major and minor identifier. When
the client has two connections each to a peer with the same major the client has two connections each to a peer with the same major
and minor identifier, the client assumes both peers are the same identifier, the client assumes both peers are the same server (the
server (the server namespace is the same via each connection), and server namespace is the same via each connection), and assumes and
further assumes session and lock state is sharable across both lock state is sharable across both connections. When each peer
connections. When each peer has the same major identifier but both the same major and minor identifier, the client assumes each
different minor identifier, the client assumes both peers can connection might be associatable with the same session.
serve the same namespace, but session and lock state is not
sharable across both connections.
Stable Storage NFS version 4 servers must be able to recover without Stable Storage NFS version 4 servers must be able to recover without
data loss from multiple power failures (including cascading power data loss from multiple power failures (including cascading power
failures, that is, several power failures in quick succession), failures, that is, several power failures in quick succession),
operating system failures, and hardware failure of components operating system failures, and hardware failure of components
other than the storage medium itself (for example, disk, other than the storage medium itself (for example, disk,
nonvolatile RAM). nonvolatile RAM).
Some examples of stable storage that are allowable for an NFS Some examples of stable storage that are allowable for an NFS
server include: server include:
skipping to change at page 18, line 28 skipping to change at page 18, line 22
Previous NFS versions have been thought of as having a host-based Previous NFS versions have been thought of as having a host-based
authentication model, where the NFS server authenticates the NFS authentication model, where the NFS server authenticates the NFS
client, and trust the client to authenticate all users. Actually, client, and trust the client to authenticate all users. Actually,
NFS has always depended on RPC for authentication. The first form of NFS has always depended on RPC for authentication. The first form of
RPC authentication which required a host-based authentication RPC authentication which required a host-based authentication
approach. NFSv4.1 also depends on RPC for basic security services, approach. NFSv4.1 also depends on RPC for basic security services,
and mandates RPC support for a user-based authentication model. The and mandates RPC support for a user-based authentication model. The
user-based authentication model has user principals authenticated by user-based authentication model has user principals authenticated by
a server, and in turn the server authenticated by user principals. a server, and in turn the server authenticated by user principals.
RPC provides some basic security services which are used by NFSv4. RPC provides some basic security services which are used by NFSv4.1.
2.2.1.1. RPC Security Flavors 2.2.1.1. RPC Security Flavors
As described in section 7.2 "Authentication" of [4], RPC security is As described in section 7.2 "Authentication" of [4], RPC security is
encapsulated in the RPC header, via a security or authentication encapsulated in the RPC header, via a security or authentication
flavor, and information specific to the specification of the security flavor, and information specific to the specification of the security
flavor. Every RPC header conveys information used to identify and flavor. Every RPC header conveys information used to identify and
authenticate a client and server. As discussed in Section 2.2.1.1.1, authenticate a client and server. As discussed in Section 2.2.1.1.1,
some security flavors provide additional security services. some security flavors provide additional security services.
skipping to change at page 22, line 20 skipping to change at page 22, line 20
Except for a small number of operations needed for session creation, Except for a small number of operations needed for session creation,
server requests and callback requests are performed within the server requests and callback requests are performed within the
context of a session. Sessions provide a client context for every context of a session. Sessions provide a client context for every
request and support robust reply protection for non-idempotent request and support robust reply protection for non-idempotent
requests. requests.
2.4. Client Identifiers and Client Owners 2.4. Client Identifiers and Client Owners
For each operation that obtains or depends on locking state, the For each operation that obtains or depends on locking state, the
specific client must be determinable by the server. In NFSv4, each specific client must be identifiable by the server.
distinct client instance is represented by a client ID, which is a
64-bit identifier that identifies a specific client at a given time
and which is changed whenever the client re-initializes, and may
change when the server re-initializes. Client IDs are used to
support lock identification and crash recovery.
In NFSv4.1, during steady state operation, the client ID associated Each distinct client instance is represented by a client ID. A
with each operation is derived from the session (see Section 2.10) on client ID is a 64-bit identifier represents a specific client at a
which the operation is issued. Each session is associated with a given time. The client ID is changed whenever the client re-
specific client ID at session creation and that client ID then initializes, and may change when the server re-initializes. Client
becomes the client ID associated with all requests issued using it. IDs are used to support lock identification and crash recovery.
Therefore, unlike NFSv4.0, the only NFSv4.1 operations possible
before a client ID is established are those needed to establish the During steady state operation, the client ID associated with each
client ID. operation is derived from the session (see Section 2.10) on which the
operation is issued. A session is associated with a client ID when
the session is created.
Unlike NFSv4.0, the only NFSv4.1 operations possible before a client
ID is established are those needed to establish the client ID.
A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION
operation using that client ID (eir_clientid as returned from operation using that client ID (eir_clientid as returned from
EXCHANGE_ID) is required to establish the identification on the EXCHANGE_ID) is required to establish and confirm the client ID on
server. Establishment of identification by a new incarnation of the the server. Establishment of identification by a new incarnation of
client also has the effect of immediately releasing any locking state the client also has the effect of immediately releasing any locking
that a previous incarnation of that same client might have had on the state that a previous incarnation of that same client might have had
server. Such released state would include all lock, share on the server. Such released state would include all lock, share
reservation, layout state, and where the server is not supporting the reservation, layout state, and where the server is not supporting the
CLAIM_DELEGATE_PREV claim type, all delegation state associated with CLAIM_DELEGATE_PREV claim type, all delegation state associated with
same client with the same identity. For discussion of delegation the same client with the same identity. For discussion of delegation
state recovery, see Section 10.2.1. For discussion of layout state state recovery, see Section 10.2.1. For discussion of layout state
recovery see Section 13.7.1. recovery see Section 13.7.1.
Releasing such state requires that the server be able to determine Releasing such state requires that the server be able to determine
that one client instance is the successor of another. Where this that one client instance is the successor of another. Where this
cannot be done, for any of a number of reasons, the locking state cannot be done, for any of a number of reasons, the locking state
will remain for a time subject to lease expiration (see Section 8.3) will remain for a time subject to lease expiration (see Section 8.3)
and the new client will need to wait for such state to be removed, if and the new client will need to wait for such state to be removed, if
it makes conflicting lock requests. it makes conflicting lock requests.
Client identification is encapsulated in the following Client Owner Client identification is encapsulated in the following Client Owner
structure: structure:
struct client_owner4 { struct client_owner4 {
verifier4 co_verifier; verifier4 co_verifier;
opaque co_ownerid<NFS4_OPAQUE_LIMIT>; opaque co_ownerid<NFS4_OPAQUE_LIMIT>;
}; };
The first field, co_verifier, is a client incarnation verifier that The first field, co_verifier, is a client incarnation verifier. The
is used to detect client reboots. Only if the co_verifier is server will start the process of canceling the client's leased state
different from that the server had previously recorded for the client if co_verifier is different than what the server has previously
(as identified by the second field of the structure, co_ownerid) does recorded for the identified client (as specified in the co_ownerid
the server start the process of canceling the client's leased state. field).
The second field, co_ownerid is a variable length string that The second field, co_ownerid is a variable length string that
uniquely defines the client so that subsequent instances of the same uniquely defines the client so that subsequent instances of the same
client bear the same co_ownerid with a different verifier. client bear the same co_ownerid with a different verifier.
There are several considerations for how the client generates the There are several considerations for how the client generates the
co_ownerid string: co_ownerid string:
o The string should be unique so that multiple clients do not o The string should be unique so that multiple clients do not
present the same string. The consequences of two clients present the same string. The consequences of two clients
presenting the same string range from one client getting an error presenting the same string range from one client getting an error
to one client having its leased state abruptly and unexpectedly to one client having its leased state abruptly and unexpectedly
canceled. canceled.
o The string should be selected so the subsequent incarnations (e.g. o The string should be selected so the subsequent incarnations (e.g.
reboots) of the same client cause the client to present the same restarts) of the same client cause the client to present the same
string. The implementor is cautioned from an approach that string. The implementor is cautioned from an approach that
requires the string to be recorded in a local file because this requires the string to be recorded in a local file because this
precludes the use of the implementation in an environment where precludes the use of the implementation in an environment where
there is no local disk and all file access is from an NFS version there is no local disk and all file access is from an NFS version
4 server. 4 server.
o The string should be the same for each server network address that o The string should be the same for each server network address that
the client accesses, (note: the precise opposite was advised in the client accesses, (note: the precise opposite was advised in
the NFSv4.0 specification [2]). This way, if a server has the NFSv4.0 specification [2]). This way, if a server has
multiple interfaces, the client can trunk traffic over multiple multiple interfaces, the client can trunk traffic over multiple
skipping to change at page 24, line 43 skipping to change at page 24, line 43
* A true random number. However since this number ought to be * A true random number. However since this number ought to be
the same between client incarnations, this shares the same the same between client incarnations, this shares the same
problem as that of the using the timestamp of the software problem as that of the using the timestamp of the software
installation. installation.
o For a user level NFS version 4 client, it should contain o For a user level NFS version 4 client, it should contain
additional information to distinguish the client from other user additional information to distinguish the client from other user
level clients running on the same host, such as a process level clients running on the same host, such as a process
identifier or other unique sequence. identifier or other unique sequence.
A server may compare a client_owner4 in an EXCHANGE_ID with an The client ID is assigned by the server (the eir_clientid result from
nfs_client_id4 established using SETCLIENTID using NFSv4 minor EXCHANGE_ID) and should be chosen so that it will not conflict with a
version 0, so that an NFSv4.1 client is not forced to delay until client ID previously assigned by the server. This applies across
lease expiration for locking state established by the earlier client server restarts.
using minor version 0. This requires the client_owner4 be
constructed the same way as the nfs_client_id4. If the latter's
contents included the server's network address, and the NFSv4.1
client does not wish to use a client ID that prevents trunking, it
should issue two EXCHANGE_ID operations. The first EXCHANGE_ID will
have a client_owner4 equal to the nfs_client_id4. This will clear
the state created by the NFSv4.0 client. The second EXCHANGE_ID will
not have the server's network address. The state created for the
second EXCHANGE_ID will not have to wait for lease expiration,
because there will be no state to expire.
Once an EXCHANGE_ID has been done, and the resulting client ID
established as associated with a session, all requests made on that
session implicitly identify that client ID, which in turn designates
the client specified using the long-form client_owner4 structure.
The shorthand client identifier (a client ID) is assigned by the
server (the eir_clientid result from EXCHANGE_ID) and should be
chosen so that it will not conflict with a client ID previously
assigned by the server. This applies across server restarts or
reboots.
In the event of a server restart, a client may find out that its In the event of a server restart, a client may find out that its
current client ID is no longer valid when receives a current client ID is no longer valid when it receives a
NFS4ERR_STALE_CLIENTID error. The precise circumstances depend of NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on
the characteristics of the sessions involved, specifically whether the characteristics of the sessions involved, specifically whether
the session is persistent (see Section 2.10.5.5). the session is persistent (see Section 2.10.5.5).
When a session is not persistent, the client will need to create a When a session is not persistent, the client will need to create a
new session. When the existing client ID is presented to a server as new session. When the existing client ID is presented to a server as
part of creating a session and that client ID is not recognized, as part of creating a session and that client ID is not recognized, as
would happen after a server reboot, the server will reject the would happen after a server restart, the server will reject the
request with the error NFS4ERR_STALE_CLIENTID. When this happens, request with the error NFS4ERR_STALE_CLIENTID. When this happens,
the client must obtain a new client ID by use of the EXCHANGE_ID the client must obtain a new client ID by use of the EXCHANGE_ID
operation and then use that client ID as the basis of the basis of a operation, then use that client ID as the basis of a new session, and
new session and then proceed to any other necessary recovery for the then proceed to any other necessary recovery for the server restart
server reboot case (See Section 8.4.2). case (See Section 8.4.2).
In the case of the session being persistent, the client will re- In the case of the session being persistent, the client will re-
establish communication using the existing session after the reboot. establish communication using the existing session after the restart.
This session will be associated with a client ID that has had state This session will be associated with a client ID that has had state
revoked (but the persistent session is never associated with a stale revoked (but the persistent session is never associated with a stale
client ID, because if the session is persistent, the client ID MUST client ID, because if the session is persistent, the client ID MUST
persist), and the client will receive an indication of that fact in persist), and the client will receive an indication of that fact via
the sr_status_flags field returned by the SEQUENCE operation (see the SEQ4_STAUS_RESTART_RECLAIM_NEEDED flag returned in the
Section 18.46.4). The client can then use the existing session to do sr_status_flags field the SEQUENCE operation (see Section 18.46.4).
whatever operations are necessary to determine the status of requests The client can then use the existing session to do whatever
outstanding at the time of reboot, while avoiding issuing new operations are necessary to determine the status of requests
outstanding at the time of restart, while avoiding issuing new
requests, particularly any involving locking on that session. Such requests, particularly any involving locking on that session. Such
requests would fail with an NFS4ERR_STALE_STATEID error, if requests would fail with an NFS4ERR_STALE_STATEID error, if
attempted. attempted.
See the detailed descriptions of EXCHANGE_ID (Section 18.35 and See the detailed descriptions of EXCHANGE_ID (Section 18.35 and
CREATE_SESSION (Section 18.36) for a complete specification of these CREATE_SESSION (Section 18.36) for a complete specification of these
operations. operations.
2.4.1. Server Release of Client ID 2.4.1. Upgrade from NFSv4.0 to NFSv4.1
To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a
client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established
using SETCLIENTID using NFSv4.0, so that an NFSv4.1 client is not
forced to delay until lease expiration for locking state established
by the earlier client using minor version 0. This requires the
client_owner4 be constructed the same way as the nfs_client_id4. If
the latter's contents included the server's network address, and the
NFSv4.1 client does not wish to use a client ID that prevents
trunking, it should issue two EXCHANGE_ID operations. The first
EXCHANGE_ID will have a client_owner4 equal to the nfs_client_id4.
This will clear the state created by the NFSv4.0 client. The second
EXCHANGE_ID will not have the server's network address. The state
created for the second EXCHANGE_ID will not have to wait for lease
expiration, because there will be no state to expire.
2.4.2. Server Release of Client ID
NFSv4.1 introduces a new operation called DESTROY_CLIENTID NFSv4.1 introduces a new operation called DESTROY_CLIENTID
(Section 18.50) which the client SHOULD use to destroy a client ID it (Section 18.50) which the client SHOULD use to destroy a client ID it
no longer needs. This permits graceful, bilateral release of a no longer needs. This permits graceful, bilateral release of a
client ID. client ID. The operation cannot be used if there are sessions
associated with the client ID, or state with an unexpired lease.
If the server determines that the client holds no associated state If the server determines that the client holds no associated state
for its client ID (including sessions, opens, locks, delegations, for its client ID (including sessions, opens, locks, delegations,
layouts, and wants), the server may choose to unilaterally release layouts, and wants), the server may choose to unilaterally release
the client ID. The server may make this choice for an inactive the client ID. The server may make this choice for an inactive
client so that resources are not consumed by those intermittently client so that resources are not consumed by those intermittently
active clients. If the client contacts the server after this active clients. If the client contacts the server after this
release, the server must ensure the client receives the appropriate release, the server must ensure the client receives the appropriate
error so that it will use the EXCHANGE_ID/CREATE_SESSION sequence to error so that it will use the EXCHANGE_ID/CREATE_SESSION sequence to
establish a new identity. It should be clear that the server must be establish a new identity. It should be clear that the server must be
very hesitant to release a client ID since the resulting work on the very hesitant to release a client ID since the resulting work on the
client to recover from such an event will be the same burden as if client to recover from such an event will be the same burden as if
the server had failed and restarted. Typically a server would not the server had failed and restarted. Typically a server would not
release a client ID unless there had been no activity from that release a client ID unless there had been no activity from that
client for many minutes. As long as there are sessions, opens, client for many minutes. As long as there are sessions, opens,
locks, delegations, layouts, or wants, the server MUST not release locks, delegations, layouts, or wants, the server MUST not release
the client ID. See Section 2.10.9.1.4 for discussion on releasing the client ID. See Section 2.10.10.1.4 for discussion on releasing
inactive sessions. inactive sessions.
2.4.2. Resolving Client Owner Conflicts 2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently When the server gets an EXCHANGE_ID for a client owner that currently
has no state, or if it has state, but the lease has expired, server has no state, or if it has state, but the lease has expired, the
MUST allow the EXCHANGE_ID, and confirm the new client ID if followed server MUST allow the EXCHANGE_ID, and confirm the new client ID if
by the appropriate CREATE_SESSION. followed by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a client owner that currently When the server gets an EXCHANGE_ID for a new incarnation of a client
has state and an unexpired lease, the server MUST NOT destroy any owner that currently has an old incarnation with state and an
state that currently exists for the client owner unless one of the unexpired lease, the server is allowed to dispose of the state of the
following are true: previous incarnation of the client owner if one of the following are
true:
o The principal that created the client ID for the client owner is o The principal that created the client ID for the client owner is
the same as the principal that is issuing the EXCHANGE_ID. Note the same as the principal that is issuing the EXCHANGE_ID. Note
that if the client ID was created with SP4_MACH_CRED protection that if the client ID was created with SP4_MACH_CRED protection
(Section 18.35), the principal MUST be based on RPCSEC_GSS (Section 18.35), the principal MUST be based on RPCSEC_GSS
authentication, the RPCSEC_GSS service used MUST be integrity or authentication, the RPCSEC_GSS service used MUST be integrity or
privacy, and the same GSS mechanism and principal must be used as privacy, and the same GSS mechanism and principal must be used as
that used when the client ID was created. that used when the client ID was created.
o The client ID was established with SP4_SSV protection o The client ID was established with SP4_SSV protection
(Section 18.35), and the client sends the EXCHANGE_ID with the (Section 18.35, Section 2.10.7.3) and the client sends the
security flavor set to RPCSEC_GSS using the GSS SSV mechanism EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
(Section 2.10.7.4). Note that this is possible only if the server GSS SSV mechanism (Section 2.10.8).
and client persist the SSV.
o The client ID was established with SP4_SSV protection. Because o The client ID was established with SP4_SSV protection. Because
the SSV might not be persisted across client and server restart, the SSV might not be persisted across client and server restart,
and because the first time a client issues EXCHANGE_ID to a server and because the first time a client issues EXCHANGE_ID to a server
it does not have an SSV, the client MAY issue the subsequent it does not have an SSV, the client MAY issue the subsequent
EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with
SP4_MACH_CRED protection, the principal MUST be based on SP4_MACH_CRED protection, the principal MUST be based on
RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be
integrity or privacy, and the same GSS mechanism and principal integrity or privacy, and the same GSS mechanism and principal
must be used as that used when the client ID was created. must be used as that used when the client ID was created.
If the none of the above situations apply, the server MUST return If none of the above situations apply, the server MUST return
NFS4ERR_CLID_INUSE. NFS4ERR_CLID_INUSE.
Even the server accepts the principal and co_ownerid as matching that If the server accepts the principal and co_ownerid as matching that
which created the client ID, it MUST NOT delete any state unless the which created the client ID, it deletes state (upon a a
co_verifier in the EXCHANGE_ID does not match the co_verifier used CREATE_SESSION confirming the client id) if the co_verifier in the
when client ID was created. If the co_verifier matches, then the EXCHANGE_ID differs from the co_verifier used when the client ID was
client is either updating properties of the client ID, or possibly created. If the co_verifier values are the same, then the client is
attempting trunking opportunity (Section 2.10.4). either updating properties of the client ID (Section 18.35), or
possibly attempting trunking (Section 2.10.4) and the server MUST NOT
delete state.
2.5. Server Owners 2.5. Server Owners
The Server Owner is somewhat similar to a Client Owner (Section 2.4), The Server Owner is somewhat similar to a Client Owner (Section 2.4),
but unlike the Client Owner, there is no shorthand serverid. The but unlike the Client Owner, there is no shorthand serverid. The
Server Owner is defined in the following structure: Server Owner is defined in the following structure:
struct server_owner4 { struct server_owner4 {
uint64_t so_minor_id; uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>; opaque so_major_id<NFS4_OPAQUE_LIMIT>;
skipping to change at page 42, line 18 skipping to change at page 42, line 23
different EXCHANGE_ID requests, and the eir_clientid, different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and
eir_server_scope results match in both EXCHANGE_ID results, then eir_server_scope results match in both EXCHANGE_ID results, then
the client is permitted to perform session trunking. If the the client is permitted to perform session trunking. If the
client has no session mapping to the tuple of eir_clientid, client has no session mapping to the tuple of eir_clientid,
eir_server_owner.so_major_id, eir_server_scope, eir_server_owner.so_major_id, eir_server_scope,
eir_server_owner.so_minor_id, then it creates the session via a eir_server_owner.so_minor_id, then it creates the session via a
CREATE_SESSION operation over one of the connections, which CREATE_SESSION operation over one of the connections, which
associates the connection to the session. If there is a session associates the connection to the session. If there is a session
for the tuple, the client can issue BIND_CONN_TO_SESSION to for the tuple, the client can issue BIND_CONN_TO_SESSION to
associate the connection to the session. The client can invoke associate the connection to the session. Or if the client does
CREATE_SESSION regardless whether there is session for the tuple. not want to use session trunking, it can invoke CREATE_SESSION on
The second connection is associated with the same session as the the connection.
first connection via the BIND_CONN_TO_SESSION operation.
Client ID Trunking If the eia_clientowner argument is the same in Client ID Trunking If the eia_clientowner argument is the same in
two different EXCHANGE_ID requests, and the eir_clientid, two different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope results match eir_server_owner.so_major_id, and eir_server_scope results match
in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id
results do not match then the client is permitted to perform results do not match then the client is permitted to perform
client ID trunking. The client can associate each connection with client ID trunking. The client can associate each connection with
different sessions, where each session is associated with the same different sessions, where each session is associated with the same
server. Of course, even if the eir_server_owner.so_minor_id server. Of course, even if the eir_server_owner.so_minor_id
fields do match, the client is free to employ client ID trunking fields do match, the client is free to employ client ID trunking
skipping to change at page 43, line 18 skipping to change at page 43, line 22
SP4_MACH_CRED (Section 18.35) state protection options. For SP4_MACH_CRED (Section 18.35) state protection options. For
SP4_SSV, reliable verification depends on a shared secret (the SP4_SSV, reliable verification depends on a shared secret (the
SSV) that is established via the SET_SSV (Section 18.47) SSV) that is established via the SET_SSV (Section 18.47)
operation. operation.
When a new connection is associated with the session (via the When a new connection is associated with the session (via the
BIND_CONN_TO_SESSION operation, see Section 18.34), if the client BIND_CONN_TO_SESSION operation, see Section 18.34), if the client
specified SP4_SSV state protection for the BIND_CONN_TO_SESSION specified SP4_SSV state protection for the BIND_CONN_TO_SESSION
operation, the client MUST issue the BIND_CONN_TO_SESSION with operation, the client MUST issue the BIND_CONN_TO_SESSION with
RPCSEC_GSS protection, using integrity or privacy, and a RPCSEC_GSS protection, using integrity or privacy, and a
RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.7.4). The RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.8). The
RPCSEC_GSS handle is created by CREATE_SESSION (Section 18.36). RPCSEC_GSS handle is created by CREATE_SESSION (Section 18.36).
If the client mistakenly tries to associate a connection to a If the client mistakenly tries to associate a connection to a
session of a wrong server, the server will either reject the session of a wrong server, the server will either reject the
attempt because it is not aware of the session identifier of the attempt because it is not aware of the session identifier of the
BIND_CONN_TO_SESSION arguments, or it will reject the attempt BIND_CONN_TO_SESSION arguments, or it will reject the attempt
because the RPCSEC_GSS authentication fails. Even if the server because the RPCSEC_GSS authentication fails. Even if the server
mistakenly or maliciously accepts the connection association mistakenly or maliciously accepts the connection association
attempt, the RPCSEC_GSS verifier it computes in the response will attempt, the RPCSEC_GSS verifier it computes in the response will
not be verified by the client, the client will know it cannot use not be verified by the client, the client will know it cannot use
skipping to change at page 44, line 5 skipping to change at page 44, line 9
EXCHANGE_ID. Each time an EXCHANGE_ID is issued with RPCSEC_GSS EXCHANGE_ID. Each time an EXCHANGE_ID is issued with RPCSEC_GSS
authentication, the client notes the principal name of the GSS authentication, the client notes the principal name of the GSS
target. If the EXCHANGE_ID results indicate client ID trunking is target. If the EXCHANGE_ID results indicate client ID trunking is
possible, and the GSS targets' principal names are the same, the possible, and the GSS targets' principal names are the same, the
servers are the same and client ID trunking is allowed. servers are the same and client ID trunking is allowed.
The second option for verification is to use SP4_SSV protection. The second option for verification is to use SP4_SSV protection.
When the client issues EXCHANGE_ID it specifies SP4_SSV When the client issues EXCHANGE_ID it specifies SP4_SSV
protection. The first EXCHANGE_ID the client issues always has to protection. The first EXCHANGE_ID the client issues always has to
be confirmed by a CREATE_SESSION call. The client then issues be confirmed by a CREATE_SESSION call. The client then issues
SET_SSV on the sessions. Later the client issues EXCHANGE_ID to a SET_SSV. Later the client issues EXCHANGE_ID to a second
second destination network address than the first EXCHANGE_ID was destination network address than the first EXCHANGE_ID was issued
issued with. The client checks that each EXCHANGE_ID reply has with. The client checks that each EXCHANGE_ID reply has the same
the same eir_clientid, eir_server_owner.so_major_id, and eir_clientid, eir_server_owner.so_major_id, and eir_server_scope.
eir_server_scope. If so, the client verifies the claim by issuing If so, the client verifies the claim by issuing a CREATE_SESSION
a CREATE_SESSION to the second destination address, protected with to the second destination address, protected with RPCSEC_GSS
RPCSEC_GSS integrity using an RPCSEC_GSS handle returned by the integrity using an RPCSEC_GSS handle returned by the second
second EXCHANGE_ID. If the server accept the CREATE_SESSION EXCHANGE_ID. If the server accept the CREATE_SESSION request, and
request, and if the client verifies the RPCSEC_GSS verifier and if the client verifies the RPCSEC_GSS verifier and integrity
integrity codes, then the client has proof the second server knows codes, then the client has proof the second server knows the SSV,
the SSV, and thus the two servers are the same for the purposes of and thus the two servers are the same for the purposes of client
client ID trunking. ID trunking.
2.10.5. Exactly Once Semantics 2.10.5. Exactly Once Semantics
Via the session, NFSv4.1 offers exactly once semantics (EOS) for Via the session, NFSv4.1 offers exactly once semantics (EOS) for
requests sent over a channel. EOS is supported on both the fore and requests sent over a channel. EOS is supported on both the fore and
back channels. back channels.
Each COMPOUND or CB_COMPOUND request that is issued with a leading Each COMPOUND or CB_COMPOUND request that is issued with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement is regardless whether the request is exactly once. This requirement is regardless whether the request is
skipping to change at page 48, line 4 skipping to change at page 48, line 8
request), nonetheless there are considerations for the XID in NFSv4.1 request), nonetheless there are considerations for the XID in NFSv4.1
that are the same as all other previous versions of NFS. The RPC XID that are the same as all other previous versions of NFS. The RPC XID
remains in each message and must be formulated in NFSv4.1 requests as remains in each message and must be formulated in NFSv4.1 requests as
it any other ONC RPC request. The reasons include: it any other ONC RPC request. The reasons include:
o The RPC layer retains its existing semantics and implementation. o The RPC layer retains its existing semantics and implementation.
o The requester and replier must be able to interoperate at the RPC o The requester and replier must be able to interoperate at the RPC
layer, prior to the NFSv4.1 decoding of the SEQUENCE or layer, prior to the NFSv4.1 decoding of the SEQUENCE or
CB_SEQUENCE operation CB_SEQUENCE operation
o If an operation is being used that does not start with SEQUENCE or o If an operation is being used that does not start with SEQUENCE or
CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is
needed for correct operation to match the reply to the request. needed for correct operation to match the reply to the request.
o The SEQUENCE or CB_SEQUENCE operation may generate an error. If o The SEQUENCE or CB_SEQUENCE operation may generate an error. If
so, the embedded slot id, sequence id, and sessionid (if present) so, the embedded slot id, sequence id, and sessionid (if present)
in the request will not be in the reply, and the requester has in the request will not be in the reply, and the requester has
only the XID to to match the reply to the request. only the XID to match the reply to the request.
Givem that well formulated XIDs continue to be required, this begs Givem that well formulated XIDs continue to be required, this begs
the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, the question why SEQUENCE and CB_SEQUENCE replies have a sessionid,
slot id and sequence id? Having the sessionid in the reply means the slot id and sequence id? Having the sessionid in the reply means the
requester does not have to use the XID to lookup the sessionid, which requester does not have to use the XID to lookup the sessionid, which
would be necessary if the connection were associated with multiple would be necessary if the connection were associated with multiple
sessions. Having the slot id and sequence id in the reply means sessions. Having the slot id and sequence id in the reply means
requester does not have to use the XID to lookup the slot id and requester does not have to use the XID to lookup the slot id and
sequence id. Furhermore, since the XID is only 32 bits, it is too sequence id. Furhermore, since the XID is only 32 bits, it is too
small to guarantee the re-association of a reply with its request small to guarantee the re-association of a reply with its request
skipping to change at page 59, line 38 skipping to change at page 59, line 38
the client need not provide a target GSS principal for the the client need not provide a target GSS principal for the
backchannel as it did with NFSv4.0, nor the server have to implement backchannel as it did with NFSv4.0, nor the server have to implement
an RPCSEC_GSS initiator as it did with NFSv4.0 [2]. an RPCSEC_GSS initiator as it did with NFSv4.0 [2].
The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL
(Section 18.33) operations allow the client to specify flavor/ (Section 18.33) operations allow the client to specify flavor/
principal combinations. principal combinations.
Also note that the SP4_SSV state protection mode (see Section 18.35 Also note that the SP4_SSV state protection mode (see Section 18.35
and Section 2.10.7.3) has the side benefit of providing SSV-derived and Section 2.10.7.3) has the side benefit of providing SSV-derived
RPCSEC_GSS contexts (Section 2.10.7.4). RPCSEC_GSS contexts (Section 2.10.8).
2.10.7.3. Protection from Unauthorized State Changes 2.10.7.3. Protection from Unauthorized State Changes
As described to this point in the specification, the state model of As described to this point in the specification, the state model of
NFSv4.1 is vulnerable to an attacker that issues a SEQUENCE operation NFSv4.1 is vulnerable to an attacker that issues a SEQUENCE operation
with a forged sessionid and with a slot id that it expects the with a forged sessionid and with a slot id that it expects the
legitimate client to use next. When the legitimate client uses the legitimate client to use next. When the legitimate client uses the
slot id with the same sequence number, the server returns the slot id with the same sequence number, the server returns the
attacker's result from the reply cache which disrupts the legitimate attacker's result from the reply cache which disrupts the legitimate
client and thus denies service to it. Similarly an attacker could client and thus denies service to it. Similarly an attacker could
skipping to change at page 61, line 31 skipping to change at page 61, line 31
3. The physical client has multiple users, but the client 3. The physical client has multiple users, but the client
implementation has a unique client ID for each user. This is implementation has a unique client ID for each user. This is
effectively the same as the second scenario, but a disadvantage effectively the same as the second scenario, but a disadvantage
is that each user must be allocated at least one session each, so is that each user must be allocated at least one session each, so
the approach suffers from lack of economy. the approach suffers from lack of economy.
The SP4_SSV protection option uses a Secret State Verifier (SSV) The SP4_SSV protection option uses a Secret State Verifier (SSV)
which is shared between a client and server. The SSV serves as the which is shared between a client and server. The SSV serves as the
secret key for an internal (that is, internal to NFSv4.1) GSS secret key for an internal (that is, internal to NFSv4.1) GSS
mechanism that uses the secret key for Message Integrity Code (MIC) mechanism that uses the secret key for Message Integrity Code (MIC)
and Wrap tokens (Section 2.10.7.4). The SP4_SSV protection option is and Wrap tokens (Section 2.10.8). The SP4_SSV protection option is
intended for the client that has multiple users, and the system intended for the client that has multiple users, and the system
administrator does not wish to configure a permanent machine administrator does not wish to configure a permanent machine
credential for each client. The SSV is established on the server via credential for each client. The SSV is established on the server via
SET_SSV (see Section 18.47). To prevent eavesdropping, a client SET_SSV (see Section 18.47). To prevent eavesdropping, a client
SHOULD issue SET_SSV via RPCSEC_GSS with the privacy service. SHOULD issue SET_SSV via RPCSEC_GSS with the privacy service.
Several aspects of the SSV make it intractable for an attacker to Several aspects of the SSV make it intractable for an attacker to
guess the SSV, and thus associate rogue connections with a session, guess the SSV, and thus associate rogue connections with a session,
and rogue sessions with a client ID: and rogue sessions with a client ID:
o The arguments to and results of SET_SSV include digests of the old o The arguments to and results of SET_SSV include digests of the old
and new SSV, respectively. and new SSV, respectively.
o Because the initial value of the SSV is zero, therefore known, the o Because the initial value of the SSV is zero, therefore known, the
client that opts for SP4_SSV protection and opts to apply SP4_SSV client that opts for SP4_SSV protection and opts to apply SP4_SSV
protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST issue protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST issue
at least one SET_SSV operation before the first at least one SET_SSV operation before the first
BIND_CONN_TO_SESSION operation or before the second CREATE_SESSION BIND_CONN_TO_SESSION operation or before the second CREATE_SESSION
operation on a client ID. If it does not, the SSV mechanism will operation on a client ID. If it does not, the SSV mechanism will
not generate tokens (Section 2.10.7.4). A client SHOULD issue not generate tokens (Section 2.10.8). A client SHOULD issue
SET_SSV as soon as a session is created. SET_SSV as soon as a session is created.
o A SET_SSV does not replace the SSV with the argument to SET_SSV. o A SET_SSV does not replace the SSV with the argument to SET_SSV.
Instead, the current SSV on the server is logically exclusive ORed Instead, the current SSV on the server is logically exclusive ORed
(XORed) with the argument to SET_SSV. SET_SSV MUST NOT be called (XORed) with the argument to SET_SSV. Each time a new principal
with an SSV value that is zero. For this reason, each time a new uses a client ID for the first time, the client SHOULD issue a
principal uses a client ID for the first time, the client SHOULD SET_SSV with that principal's RPCSEC_GSS credentials, with
issue a SET_SSV with that principal's RPCSEC_GSS credentials, with
RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY. RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY.
Here are the types of attacks that can be attempted by an attacker Here are the types of attacks that can be attempted by an attacker
named Eve on a victim named Bob, and how SP4_SSV protection foils named Eve on a victim named Bob, and how SP4_SSV protection foils
each attack: each attack:
o Suppose Eve is the first user to log into a legitimate client. o Suppose Eve is the first user to log into a legitimate client.
Eve's use of an NFSv4.1 file system will cause an SSV to be Eve's use of an NFSv4.1 file system will cause an SSV to be
created via the legitimate client's NFSv4.1 implementation. The created via the legitimate client's NFSv4.1 implementation. The
SET_SSV that creates the SSV will be protected by the RPCSEC_GSS SET_SSV that creates the SSV will be protected by the RPCSEC_GSS
skipping to change at page 64, line 21 skipping to change at page 64, line 19
If the goal of a counter threat strategy is to prevent a connection If the goal of a counter threat strategy is to prevent a connection
hijacker from making unauthorized state changes, then the hijacker from making unauthorized state changes, then the
SP4_MACH_CRED protection approach can be used with a client ID per SP4_MACH_CRED protection approach can be used with a client ID per
user (i.e. the aforementioned third scenario for machine credential user (i.e. the aforementioned third scenario for machine credential
state protection). Each EXCHANGE_ID can specify the all operations state protection). Each EXCHANGE_ID can specify the all operations
MUST be protected with the machine credential. The server will then MUST be protected with the machine credential. The server will then
reject any subsequent operations on the client ID that do not use reject any subsequent operations on the client ID that do not use
RPCSEC_GSS with privacy or integrity and do not use the same RPCSEC_GSS with privacy or integrity and do not use the same
credential that created the client ID. credential that created the client ID.
2.10.7.4. The SSV GSS Mechanism 2.10.8. The SSV GSS Mechanism
The SSV provides the secret key for a mechanism that NFSv4.1 uses for The SSV provides the secret key for a mechanism that NFSv4.1 uses for
state protection. Contexts for this mechanism are not established state protection. Contexts for this mechanism are not established
via the RPCSEC_GSS protocol. Instead, the contexts are automatically via the RPCSEC_GSS protocol. Instead, the contexts are automatically
created when EXCHANGE_ID specifies SP4_SSV protection. The only created when EXCHANGE_ID specifies SP4_SSV protection. The only
tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the
SealedMessage (emitted by GSS_Wrap). SealedMessage (emitted by GSS_Wrap).
The mechanism OID for the SSV mechanism is: The mechanism OID for the SSV mechanism is:
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech
(1.3.6.1.4.1.28882.1.1). While the SSV mechanisms does not define (1.3.6.1.4.1.28882.1.1). While the SSV mechanisms does not define
any initial context tokens, the OID can be used to let servers any initial context tokens, the OID can be used to let servers
indicate that the SSV mechanism is acceptable whenever the client indicate that the SSV mechanism is acceptable whenever the client
issues a SECINFO or SECINFO_NO_NAME operation (see Section 2.6). issues a SECINFO or SECINFO_NO_NAME operation (see Section 2.6).
The SSV mechanism defines four subkeys dervived from the SSV value.
Each time SET_SSV is invoked the subkeys are recalculated by the
client and server. The four subkeys are calculated by from each of
the valid ssv_subkey4 enumerated values. The calculation uses the
HMAC ([12]), algorithm, using the current SSV as the key, the one way
hash algorithm as negotiated by EXCHANGE_ID, and the input text as
represented by the XDR encoded enumneration of type ssv_subkey4.
/* Input for computing subkeys */
enum ssv_subkey4 {
SSV4_SUBKEY_MIC_I2T = 1,
SSV4_SUBKEY_MIC_T2I = 2,
SSV4_SUBKEY_SEAL_I2T = 3,
SSV4_SUBKEY_SEAL_T2I = 4
};
The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating
message integrity codes (MICs) that originate from the NFSv4.1
client, whether as part of a request over the fore channel, or a
response over the backchannel. The subkey derived from SSV4_SUBKEY-
MIST2I is used for MICs originating from the NFSv4.1 server. The
subkey derived from SSV4_SUBKEY_SEAL_I2T is used for encryption text
originating from the NFSv4.1 client and the subkey derived from
SSV4_SUBKEY_SEAL_T2I is used for encryption text originating from the
NFSv4.1 server.
The field smt_hmac is an HMAC calculated by using the subkey derived
from SSV4_SUBKEY_MIC_I2T or SSV4_SUBKEY_MIC_T2I as the key, the one
way hash algorithm as negotiated by EXCHANGE_ID, and the input text
as represented by data of type ssv_mic_plain_tkn4. The field
smpt_ssv_seq is the same as smt_ssv_seq. The field smt_orig_plain is
the input text as passed into GSS_GetMIC().
The PerMsgToken description is based on an XDR definition: The PerMsgToken description is based on an XDR definition:
/* Input for computing smt_hmac */ /* Input for computing smt_hmac */
struct ssv_mic_plain_tkn4 { struct ssv_mic_plain_tkn4 {
uint32_t smpt_ssv_seq; uint32_t smpt_ssv_seq;
opaque smpt_orig_plain<>; opaque smpt_orig_plain<>;
}; };
/* SSV GSS PerMsgToken token */ /* SSV GSS PerMsgToken token */
struct ssv_mic_tkn4 { struct ssv_mic_tkn4 {
uint32_t smt_ssv_seq; uint32_t smt_ssv_seq;
opaque smt_hmac<>; opaque smt_hmac<>;
}; };
The token emitted by GSS_GetMIC() is XDR encoded and of XDR data type The token emitted by GSS_GetMIC() is XDR encoded and of XDR data type
ssv_mic_tkn4. The field smt_ssv_seq comes from the SSV sequence ssv_mic_tkn4. The field smt_ssv_seq comes from the SSV sequence
number which is equal to 1 after SET_SSV is called the first time on number which is equal to 1 after SET_SSV (Section 18.47) is called
a client ID. Thereafter, it is incremented on each SET_SSV. Thus the first time on a client ID. Thereafter, it is incremented on each
smt_ssv_seq represents the version of the SSV at the time SET_SSV. Thus smt_ssv_seq represents the version of the SSV at the
GSS_GetMIC() was called. This allows the SSV to be changed without time GSS_GetMIC() was called. As noted in Section 18.35, the client
serializing all RPC calls that use the SSV mechanism with SET_SSV and server can maintain multiple concurrent versions of the SSV.
operations. This allows the SSV to be changed without serializing all RPC calls
that use the SSV mechanism with SET_SSV operations.
The field smt_hmac is an HMAC ([12]), calculated by using the current
SSV as the key, the one way hash algorithm as negotiated by
EXCHANGE_ID, and the input text as represented by data of type
ssv_mic_plain_tkn4. The field smpt_ssv_seq is the same as
smt_ssv_seq. The field smt_orig_plain is the input text as passed
into GSS_GetMIC().
The SealedMessage description is based on an XDR definition: The SealedMessage description is based on an XDR definition:
/* Input for computing ssct_encr_data and ssct_hmac */ /* Input for computing ssct_encr_data and ssct_hmac */
struct ssv_seal_plain_tkn4 { struct ssv_seal_plain_tkn4 {
opaque sspt_confounder<>; opaque sspt_confounder<>;
uint32_t sspt_ssv_seq; uint32_t sspt_ssv_seq;
opaque sspt_orig_plain<>; opaque sspt_orig_plain<>;
opaque sspt_pad<>; opaque sspt_pad<>;
}; };
skipping to change at page 66, line 7 skipping to change at page 66, line 28
opaque ssct_hmac<>; opaque ssct_hmac<>;
}; };
The token emitted by GSS_Wrap() is XDR encoded and of XDR data type The token emitted by GSS_Wrap() is XDR encoded and of XDR data type
ssv_seal_cipher_tkn4. ssv_seal_cipher_tkn4.
The ssct_ssv_seq field has the same meaning as smt_ssv_seq. The ssct_ssv_seq field has the same meaning as smt_ssv_seq.
The ssct_encr_data field is the result of encrypting a value of the The ssct_encr_data field is the result of encrypting a value of the
XDR encoded data type ssv_seal_plain_tkn4. The encryption key is the XDR encoded data type ssv_seal_plain_tkn4. The encryption key is the
SSV, and the encryption algorithm is that negotiated by EXCHANGE_ID. subkey derived from SSV4_SUBKEY_SEAL_I2T or SSV4_SUBKEY_SEAL_T2I, and
the encryption algorithm is that negotiated by EXCHANGE_ID.
The ssct_iv field is the initialization vector (IV) for the The ssct_iv field is the initialization vector (IV) for the
encryption algorithm (if applicable) and is sent in clear text. The encryption algorithm (if applicable) and is sent in clear text. The
content and size of the IV MUST comply with specification of the content and size of the IV MUST comply with specification of the
encryption algorithm. For example, the id-aes256-CBC algorithm MUST encryption algorithm. For example, the id-aes256-CBC algorithm MUST
use a 16 octet initialization vector (IV) which MUST be unpredictable use a 16 octet initialization vector (IV) which MUST be unpredictable
for each instance of a value of type ssv_seal_plain_tkn4 that is for each instance of a value of type ssv_seal_plain_tkn4 that is
encrypted with a particular SSV key. encrypted with a particular SSV key.
The ssct_hmac field is the result of computing an HMAC using value of The ssct_hmac field is the result of computing an HMAC using value of
the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The
key is the SSV, and the one way hash algorithm is that negotiated by key is the subkey dervived from SSV4_SUBKEY_MIC_I2T or
EXCHANGE_ID. SSV4_SUBKEY_MIC_T2I, and the one way hash algorithm is that
negotiated by EXCHANGE_ID.
The sspt_confounder field is a random value. The sspt_confounder field is a random value.
The sspt_ssv_seq field is the same as ssvt_ssv_seq. The sspt_ssv_seq field is the same as ssvt_ssv_seq.
The sspt_orig_plain field is the original plaintext as passed to The sspt_orig_plain field is the original plaintext as passed to
GSS_Wrap(). GSS_Wrap().
The sspt_pad field is present to support encryption algorithms that The sspt_pad field is present to support encryption algorithms that
require inputs to be in fixed sized blocks. The content of sspt_pad require inputs to be in fixed sized blocks. The content of sspt_pad
skipping to change at page 67, line 6 skipping to change at page 67, line 34
total encoding of 16 octets. The total number of XDR encoded octets total encoding of 16 octets. The total number of XDR encoded octets
is thus 8 + 4 + 20 + 16 = 48. is thus 8 + 4 + 20 + 16 = 48.
GSS_Wrap() emits a token that is an XDR encoding of a value of data GSS_Wrap() emits a token that is an XDR encoding of a value of data
type ssv_seal_cipher_tkn4. Note that regardless whether the caller type ssv_seal_cipher_tkn4. Note that regardless whether the caller
of GSS_Wrap() requests confidentiality or not, the token always has of GSS_Wrap() requests confidentiality or not, the token always has
confidentiality. This is because the SSV mechanism is for confidentiality. This is because the SSV mechanism is for
RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without
confidentiality. confidentiality.
Effectively there is a single GSS context for all RPCSEC_GSS handles Effectively there is a single GSS context for a single client ID.
that have been created on a session. And all sessions associated All RPCSEC_GSS handles share the same GSS context. SSV GSS contexts
with a a client ID share the same SSV. SSV GSS contexts do not do not expire except when the SSV is destroyed (causes would include
expire except when the SSV is destroyed (causes would include the the client ID being destroyed or a server restart). Since one
client ID being destroyed or a server restart). Since one purpose of purpose of context expiration is to replace keys that have been in
context expiration is to replace keys that have been in use for "too use for "too long" hence vulnerable to compromise by brute force or
long" hence vulnerable to compromise by brute force or accident, the accident, the client can issue periodic SET_SSV operations, by
client can issue periodic SET_SSV operations, by cycling through cycling through different users' RPCSEC_GSS credentials. This way
different users' RPCSEC_GSS credentials. This way the SSV is the SSV is replaced without destroying the SSV's GSS contexts.
replaced without destroying the SSV's GSS contexts. If for some
reason SSV RPCSEC_GSS handles expire, the EXCHANGE_ID operation can
be used to create more SSV RPCSEC_GSS handles.
The client MUST establish an SSV via SET_SSV before the GSS context SSV RPCSEC_GSS handles can be expired or deleted by the server at any
can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). If time and the EXCHANGE_ID operation can be used to create more SSV
SET_SSV has not been successfully called, attempts to emit tokens RPCSEC_GSS handles.
The client MUST establish an SSV via SET_SSV before the SSV GSS
context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC().
If SET_SSV has not been successfully called, attempts to emit tokens
MUST fail. MUST fail.
The SSV mechanism does not support replay detection and sequencing in The SSV mechanism does not support replay detection and sequencing in
its tokens because RPCSEC_GSS does not use those features (Section its tokens because RPCSEC_GSS does not use those features (See
5.2.2 "Context Creation Requests" in [5]). Section 5.2.2 "Context Creation Requests" in [5]).
2.10.8. Session Mechanics - Steady State 2.10.9. Session Mechanics - Steady State
2.10.8.1. Obligations of the Server 2.10.9.1. Obligations of the Server
The server has the primary obligation to monitor the state of The server has the primary obligation to monitor the state of
backchannel resources that the client has created for the server backchannel resources that the client has created for the server
(RPCSEC_GSS contexts and backchannel connections). If these (RPCSEC_GSS contexts and backchannel connections). If these
resources vanish, the server takes action as specified in resources vanish, the server takes action as specified in
Section 2.10.9.2. Section 2.10.10.2.
2.10.8.2. Obligations of the Client 2.10.9.2. Obligations of the Client
The client SHOULD honor the following obligations in order to utilize The client SHOULD honor the following obligations in order to utilize
the session: the session:
o Keep a necessary session from going idle on the server. A client o Keep a necessary session from going idle on the server. A client
that requires a session, but nonetheless is not sending operations that requires a session, but nonetheless is not sending operations
risks having the session be destroyed by the server. This is risks having the session be destroyed by the server. This is
because sessions consume resources, and resource limitations may because sessions consume resources, and resource limitations may
force the server to cull a session that has not been used for long force the server to cull a session that has not been used for long
time. [[Comment.6: Tom Talpey disagrees and thinks a server can time. [[Comment.6: Tom Talpey disagrees and thinks a server can
skipping to change at page 68, line 4 skipping to change at page 68, line 33
o Keep a necessary session from going idle on the server. A client o Keep a necessary session from going idle on the server. A client
that requires a session, but nonetheless is not sending operations that requires a session, but nonetheless is not sending operations
risks having the session be destroyed by the server. This is risks having the session be destroyed by the server. This is
because sessions consume resources, and resource limitations may because sessions consume resources, and resource limitations may
force the server to cull a session that has not been used for long force the server to cull a session that has not been used for long
time. [[Comment.6: Tom Talpey disagrees and thinks a server can time. [[Comment.6: Tom Talpey disagrees and thinks a server can
never cull a session. Mike Eisler doesn't know what the server is never cull a session. Mike Eisler doesn't know what the server is
supposed to do when it accumulates a zillion reply caches that no supposed to do when it accumulates a zillion reply caches that no
client has touched in a century. :-)]] client has touched in a century. :-)]]
o Destroy the session when not needed. If a client has multiple o Destroy the session when not needed. If a client has multiple
sessions and one of them has no requests waiting for replies, and sessions and one of them has no requests waiting for replies, and
has been idle for some period of time, it SHOULD destroy the has been idle for some period of time, it SHOULD destroy the
session. session.
o Maintain GSS contexts for the callback channel. If the client o Maintain GSS contexts for the backchannel. If the client requires
requires the server to use the RPCSEC_GSS security flavor for the server to use the RPCSEC_GSS security flavor for callbacks,
callbacks, then it needs to be sure the contexts handed to the then it needs to be sure the contexts handed to the server via
server via BACKCHANNEL_CTL are unexpired. BACKCHANNEL_CTL are unexpired.
o Preserve a connection for a backchannel. The server requires a o Preserve a connection for a backchannel. The server requires a
backchannel in order to gracefully recall recallable state, or backchannel in order to gracefully recall recallable state, or
notify the client of certain events. Note that if the connection notify the client of certain events. Note that if the connection
is not being used for the fore channel, there is no way the client is not being used for the fore channel, there is no way the client
tell if the connection is still alive (e.g., the server rebooted tell if the connection is still alive (e.g., the server restarted
without sending a disconnect). The onus is on the server, not the without sending a disconnect). The onus is on the server, not the
client, to determine if the backchannel's connection is alive, and client, to determine if the backchannel's connection is alive, and
to indicate in the response to a SEQUENCE operation when the last to indicate in the response to a SEQUENCE operation when the last
connection associated with a session's backchannel has connection associated with a session's backchannel has
disconnected. disconnected.
2.10.8.3. Steps the Client Takes To Establish a Session 2.10.9.3. Steps the Client Takes To Establish a Session
If the client does not have a client ID, the client issues If the client does not have a client ID, the client issues
EXCHANGE_ID to establish a client ID. If it opts for SP4_MACH_CRED EXCHANGE_ID to establish a client ID. If it opts for SP4_MACH_CRED
or SP4_SSV protection, in the spo_must_enforce list of operations, it or SP4_SSV protection, in the spo_must_enforce list of operations, it
SHOULD at minimum specify: CREATE_SESSION, DESTROY_SESSION, SHOULD at minimum specify: CREATE_SESSION, DESTROY_SESSION,
BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts
for SP4_SSV protection, the client needs to ask for SSV-based for SP4_SSV protection, the client needs to ask for SSV-based
RPCSEC_GSS handles. RPCSEC_GSS handles.
The client uses the client ID to issue a CREATE_SESSION on a The client uses the client ID to issue a CREATE_SESSION on a
connection to the server. The results of CREATE_SESSION indicate connection to the server. The results of CREATE_SESSION indicate
whether the server will persist the session reply cache through a whether the server will persist the session reply cache through a
server reboot or not, and the client notes this for future reference. server restarted or not, and the client notes this for future
reference.
If the client specified SP4_SSV state protection when the client ID If the client specified SP4_SSV state protection when the client ID
was created, then it SHOULD issue SET_SSV in the first COMPOUND after was created, then it SHOULD issue SET_SSV in the first COMPOUND after
the session is created. Each time a new principal goes to use the the session is created. Each time a new principal goes to use the
client ID, it SHOULD issue a SET_SSV again. client ID, it SHOULD issue a SET_SSV again.
If the client wants to use delegations, layouts, directory If the client wants to use delegations, layouts, directory
notifications, or any other state that requires a backchannel, then notifications, or any other state that requires a backchannel, then
it must add a connection to the backchannel if CREATE_SESSION did not it must add a connection to the backchannel if CREATE_SESSION did not
already do so. The client creates a connection, and calls already do so. The client creates a connection, and calls
skipping to change at page 69, line 18 skipping to change at page 69, line 48
If the client wants to use additional connections for the If the client wants to use additional connections for the
backchannel, then it must call BIND_CONN_TO_SESSION on each backchannel, then it must call BIND_CONN_TO_SESSION on each
connection it wants to use with the session. If the client wants to connection it wants to use with the session. If the client wants to
use additional connections for the fore channel, then it must call use additional connections for the fore channel, then it must call
BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state
protection when the client ID was created. protection when the client ID was created.
At this point the session has reached steady state. At this point the session has reached steady state.
2.10.9. Session Mechanics - Recovery 2.10.10. Session Mechanics - Recovery
2.10.9.1. Events Requiring Client Action 2.10.10.1. Events Requiring Client Action
The following events require client action to recover. The following events require client action to recover.
2.10.9.1.1. RPCSEC_GSS Context Loss by Callback Path 2.10.10.1.1. RPCSEC_GSS Context Loss by Callback Path
If all RPCSEC_GSS contexts granted by the client to the server for If all RPCSEC_GSS contexts granted by the client to the server for
callback use have expired, the client MUST establish a new context callback use have expired, the client MUST establish a new context
via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE
results indicates when callback contexts are nearly expired, or fully results indicates when callback contexts are nearly expired, or fully
expired (see Section 18.46.4). expired (see Section 18.46.4).
2.10.9.1.2. Connection Loss 2.10.10.1.2. Connection Loss
If the client loses the last connection of the session, and if wants If the client loses the last connection of the session, and if wants
to retain the session, then it must create a new connection, and if, to retain the session, then it must create a new connection, and if,
when the client ID was created, BIND_CONN_TO_SESSION was specified in when the client ID was created, BIND_CONN_TO_SESSION was specified in
the spo_must_enforce list, the client MUST use BIND_CONNN_TO_SESSION the spo_must_enforce list, the client MUST use BIND_CONNN_TO_SESSION
to associate the connection with the session. to associate the connection with the session.
If there was a request outstanding at the time the of connection If there was a request outstanding at the time the of connection
loss, then if client wants to continue to use the session it MUST loss, then if client wants to continue to use the session it MUST
retry the request, as described in Section 2.10.5.2. Note that it is retry the request, as described in Section 2.10.5.2. Note that it is
skipping to change at page 70, line 10 skipping to change at page 70, line 43
disconnect. disconnect.
If the connection that was lost was the last one associated with the If the connection that was lost was the last one associated with the
backchannel, and the client wants to retain the backchannel and/or backchannel, and the client wants to retain the backchannel and/or
not put recallable state subject to revocation, the client must not put recallable state subject to revocation, the client must
reconnect, and if it does, it MUST associate the connection to the reconnect, and if it does, it MUST associate the connection to the
session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD
indicate when it has no callback connection via the sr_status_flags indicate when it has no callback connection via the sr_status_flags
result from SEQUENCE. result from SEQUENCE.
2.10.9.1.3. Backchannel GSS Context Loss 2.10.10.1.3. Backchannel GSS Context Loss
Via the sr_status_flags result of the SEQUENCE operation or other Via the sr_status_flags result of the SEQUENCE operation or other
means, the client will learn if some or all of the RPCSEC_GSS means, the client will learn if some or all of the RPCSEC_GSS
contexts it assigned to the backchannel have been lost. If the contexts it assigned to the backchannel have been lost. If the
client wants to the retain the backchannel and/or not put recallable client wants to the retain the backchannel and/or not put recallable
state subjection to revocation, the client must use BACKCHANNEL_CTL state subjection to revocation, the client must use BACKCHANNEL_CTL
to assign new contexts. to assign new contexts.
2.10.9.1.4. Loss of Session 2.10.10.1.4. Loss of Session
The replier might lose a record of the session. Causes include: The replier might lose a record of the session. Causes include:
o Replier crash and reboot o Replier failure and restart
o A catastrophe that causes the reply cache to be corrupted or lost o A catastrophe that causes the reply cache to be corrupted or lost
on the media it was stored on. This applies even if the replier on the media it was stored on. This applies even if the replier
indicated in the CREATE_SESSION results that it would persist the indicated in the CREATE_SESSION results that it would persist the
cache. cache.
o The server purges the session of a client that has been inactive o The server purges the session of a client that has been inactive
for a very extended period of time. for a very extended period of time.
Loss of reply cache is equivalent to loss of session. The replier Loss of reply cache is equivalent to loss of session. The replier
indicates loss of session to the requester by returning indicates loss of session to the requester by returning
NFS4ERR_BADSESSION on the next operation that uses the sessionid that NFS4ERR_BADSESSION on the next operation that uses the sessionid that
refers to the lost session. refers to the lost session.
After an event like a server reboot, the client may have lost its After an event like a server restart, the client may have lost its
connections. The client assumes for the moment that the session has connections. The client assumes for the moment that the session has
not been lost. It reconnects, and if it specified connection not been lost. It reconnects, and if it specified connection
association enforcement when the session was created, it invokes association enforcement when the session was created, it invokes
BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes
SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns
NFS4ERR_BADSESSION, the client knows the session was lost. If the NFS4ERR_BADSESSION, the client knows the session was lost. If the
connection survives session loss, then the next SEQUENCE operation connection survives session loss, then the next SEQUENCE operation
the client issues over the connection will get back the client issues over the connection will get back
NFS4ERR_BADSESSION. The client again knows the session was lost. NFS4ERR_BADSESSION. The client again knows the session was lost.
skipping to change at page 71, line 13 skipping to change at page 71, line 47
have been performed on the server at the time of session loss. The have been performed on the server at the time of session loss. The
client has no general way to recover from this. client has no general way to recover from this.
Note that loss of session does not imply loss of lock, open, Note that loss of session does not imply loss of lock, open,
delegation, or layout state because locks, opens, delegations, and delegation, or layout state because locks, opens, delegations, and
layouts are tied to the client ID and depend on the client ID, not layouts are tied to the client ID and depend on the client ID, not
the session. Nor does loss of lock, open, delegation, or layout the session. Nor does loss of lock, open, delegation, or layout
state imply loss of session state, because the session depends on the state imply loss of session state, because the session depends on the
client ID; loss of client ID however does imply loss of session, client ID; loss of client ID however does imply loss of session,
lock, open, delegation, and layout state. See Section 8.4.2. A lock, open, delegation, and layout state. See Section 8.4.2. A
session can survive a server reboot, but lock recovery may still be session can survive a server restart, but lock recovery may still be
needed. needed.
It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID
(for example the server reboots and does not preserve client ID (for example the server restarts and does not preserve client ID
state). If so, the client needs to call EXCHANGE_ID, followed by state). If so, the client needs to call EXCHANGE_ID, followed by
CREATE_SESSION. CREATE_SESSION.
2.10.9.2. Events Requiring Server Action 2.10.10.2. Events Requiring Server Action
The following events require server action to recover. The following events require server action to recover.
2.10.9.2.1. Client Crash and Reboot 2.10.10.2.1. Client Crash and Restart
As described in Section 18.35, a rebooted client issues EXCHANGE_ID As described in Section 18.35, a restarted client issues EXCHANGE_ID
in such a way it causes the server to delete any sessions it had. in such a way it causes the server to delete any sessions it had.
2.10.9.2.2. Client Crash with No Reboot 2.10.10.2.2. Client Crash with No Restart
If a client crashes and never comes back, it will never issue If a client crashes and never comes back, it will never issue
EXCHANGE_ID with its old client owner. Thus the server has session EXCHANGE_ID with its old client owner. Thus the server has session
state that will never be used again. After an extended period of state that will never be used again. After an extended period of
time and if the server has resource constraints, it MAY destroy the time and if the server has resource constraints, it MAY destroy the
old session as well as locking state. old session as well as locking state.
2.10.9.2.3. Extended Network Partition 2.10.10.2.3. Extended Network Partition
To the server, the extended network partition may be no different To the server, the extended network partition may be no different
from a client crash with no reboot (see Section 2.10.9.2.2). Unless from a client crash with no restart (see Section 2.10.10.2.2).
the server can discern that there is a network partition, it is free Unless the server can discern that there is a network partition, it
to treat the situation as if the client has crashed permanently. is free to treat the situation as if the client has crashed
permanently.
2.10.9.2.4. Backchannel Connection Loss 2.10.10.2.4. Backchannel Connection Loss
If there were callback requests outstanding at the time of a If there were callback requests outstanding at the time of a
connection loss, then the server MUST retry the request, as described connection loss, then the server MUST retry the request, as described
in Section 2.10.5.2. Note that it is not necessary to retry requests in Section 2.10.5.2. Note that it is not necessary to retry requests
over a connection with the same source network address or the same over a connection with the same source network address or the same
destination network address as the lost connection. As long as the destination network address as the lost connection. As long as the
sessionid, slot id, and sequence id in the retry match that of the sessionid, slot id, and sequence id in the retry match that of the
original request, the callback target will recognize the request as a original request, the callback target will recognize the request as a
retry even if it did see the request prior to disconnect. retry even if it did see the request prior to disconnect.
If the connection lost is the last one associated with the If the connection lost is the last one associated with the
backchannel, then the server MUST indicate that in the backchannel, then the server MUST indicate that in the
sr_status_flags field of every SEQUENCE reply until the backchannel sr_status_flags field of every SEQUENCE reply until the backchannel
is reestablished. There are two situations each of which use is reestablished. There are two situations each of which use
different status flags: no connectivity for the session's different status flags: no connectivity for the session's
backchannel, and no connectivity for any session backchannel of the backchannel, and no connectivity for any session backchannel of the
client. See Section 18.46 for a description of the appropriate flags client. See Section 18.46 for a description of the appropriate flags
in sr_status_flags. in sr_status_flags.
2.10.9.2.5. GSS Context Loss 2.10.10.2.5. GSS Context Loss
The server SHOULD monitor when the number RPCSEC_GSS contexts The server SHOULD monitor when the number RPCSEC_GSS contexts
assigned to the backchannel reaches one, and that one context is near assigned to the backchannel reaches one, and that one context is near
expiry (i.e. between one and two periods of lease time), and indicate expiry (i.e. between one and two periods of lease time), and indicate
so in the sr_status_flags field of all SEQUENCE replies. The server so in the sr_status_flags field of all SEQUENCE replies. The server
MUST indicate when the all of the backchannel's assigned RPCSEC_GSS MUST indicate when the all of the backchannel's assigned RPCSEC_GSS
contexts have expired in the sr_status_flags field of all SEQUENCE contexts have expired in the sr_status_flags field of all SEQUENCE
replies. replies.
2.10.10. Parallel NFS and Sessions 2.10.11. Parallel NFS and Sessions
A client and server can potentially be a non-pNFS implementation, a A client and server can potentially be a non-pNFS implementation, a
metadata server implementation, a data server implementation, or two metadata server implementation, a data server implementation, or two
or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS,
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not
mutually exclusive) are passed in the EXCHANGE_ID arguments and mutually exclusive) are passed in the EXCHANGE_ID arguments and
results to allow the client to indicate how it wants to use sessions results to allow the client to indicate how it wants to use sessions
created under the client ID, and to allow the server to indicate how created under the client ID, and to allow the server to indicate how
it will allow the sessions to be used. See Section 14.1 for pNFS it will allow the sessions to be used. See Section 14.1 for pNFS
sessions considerations. sessions considerations.
skipping to change at page 76, line 12 skipping to change at page 76, line 42
This data type represents additional information for the device file This data type represents additional information for the device file
types NF4CHR and NF4BLK. types NF4CHR and NF4BLK.
3.2.5. fsid4 3.2.5. fsid4
struct fsid4 { struct fsid4 {
uint64_t major; uint64_t major;
uint64_t minor; uint64_t minor;
}; };
3.2.6. fs_location4 3.2.6. chg_policy4
struct change_policy4 {
uint64_t cp_major;
uint64_t cp_minor;
};
The chg_policy4 data type is used for the change_policy recommended
attribute. It provides change sequencing indication analogous to the
change attribute. To enable the server to present a value valid
across server re-initialization without requiring persistent storage,
two 64-bit quantities are used, allowing one to be a server instance
id and the second to be incremented non-persistently, within a given
server instance.
3.2.7. fs_location4
struct fs_location4 { struct fs_location4 {
utf8str_cis server<>; utf8str_cis server<>;
pathname4 rootpath; pathname4 rootpath;
}; };
3.2.7. fs_locations4 3.2.8. fs_locations4
struct fs_locations4 { struct fs_locations4 {
pathname4 fs_root; pathname4 fs_root;
fs_location4 locations<>; fs_location4 locations<>;
}; };
The fs_location4 and fs_locations4 data types are used for the The fs_location4 and fs_locations4 data types are used for the
fs_locations recommended attribute which is used for migration and fs_locations recommended attribute which is used for migration and
replication support. replication support.
3.2.8. fattr4 3.2.9. fattr4
struct fattr4 { struct fattr4 {
bitmap4 attrmask; bitmap4 attrmask;
attrlist4 attr_vals; attrlist4 attr_vals;
}; };
The fattr4 structure is used to represent file and directory The fattr4 structure is used to represent file and directory
attributes. attributes.
The bitmap is a counted array of 32 bit integers used to contain bit The bitmap is a counted array of 32 bit integers used to contain bit
values. The position of the integer in the array that contains bit n values. The position of the integer in the array that contains bit n
can be computed from the expression (n / 32) and its bit within that can be computed from the expression (n / 32) and its bit within that
integer is (n mod 32). integer is (n mod 32).
0 1 0 1
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
| count | 31 .. 0 | 63 .. 32 | | count | 31 .. 0 | 63 .. 32 |
+-----------+-----------+-----------+-- +-----------+-----------+-----------+--
3.2.9. change_info4 3.2.10. change_info4
struct change_info4 { struct change_info4 {
bool atomic; bool atomic;
changeid4 before; changeid4 before;
changeid4 after; changeid4 after;
}; };
This structure is used with the CREATE, LINK, REMOVE, RENAME This structure is used with the CREATE, LINK, REMOVE, RENAME
operations to let the client know the value of the change attribute operations to let the client know the value of the change attribute
for the directory in which the target file system object resides. for the directory in which the target file system object resides.
3.2.10. netaddr4 3.2.11. netaddr4
struct netaddr4 { struct netaddr4 {
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string na_r_netid<>; /* network id */ string na_r_netid<>; /* network id */
string na_r_addr<>; /* universal address */ string na_r_addr<>; /* universal address */
}; };
The netaddr4 structure is used to identify TCP/IP based endpoints. The netaddr4 structure is used to identify TCP/IP based endpoints.
The r_netid and r_addr fields are specified in RFC1833 [26], but they The r_netid and r_addr fields are specified in RFC1833 [26], but they
are underspecified in RFC1833 [26] as far as what they should look are underspecified in RFC1833 [26] as far as what they should look
skipping to change at page 78, line 20 skipping to change at page 79, line 20
representing an IPv6 address as defined in Section 2.2 of RFC1884 representing an IPv6 address as defined in Section 2.2 of RFC1884
[13]. Additionally, the two alternative forms specified in Section [13]. Additionally, the two alternative forms specified in Section
2.2 of RFC1884 [13] are also acceptable. 2.2 of RFC1884 [13] are also acceptable.
For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP
over IPv6 the value of r_netid is the string "udp6". That this over IPv6 the value of r_netid is the string "udp6". That this
document specifies the universal address and netid for UDP/IPv6 does document specifies the universal address and netid for UDP/IPv6 does
not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see
Section 2.9). Section 2.9).
3.2.11. state_owner4 3.2.12. state_owner4
struct state_owner4 { struct state_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
typedef state_owner4 open_owner4; typedef state_owner4 open_owner4;
typedef state_owner4 lock_owner4; typedef state_owner4 lock_owner4;
The state_owner4 data type is the base type for the open_owner4 The state_owner4 data type is the base type for the open_owner4
Section 3.2.11.1 and lock_owner4 Section 3.2.11.2. NFS4_OPAQUE_LIMIT Section 3.2.12.1 and lock_owner4 Section 3.2.12.2. NFS4_OPAQUE_LIMIT
is defined as 1024. is defined as 1024.
3.2.11.1. open_owner4 3.2.12.1. open_owner4
This structure is used to identify the owner of open state. This structure is used to identify the owner of open state.
3.2.11.2. lock_owner4 3.2.12.2. lock_owner4
This structure is used to identify the owner of file locking state. This structure is used to identify the owner of file locking state.
3.2.12. open_to_lock_owner4 3.2.13. open_to_lock_owner4
struct open_to_lock_owner4 { struct open_to_lock_owner4 {
seqid4 open_seqid; seqid4 open_seqid;
stateid4 open_stateid; stateid4 open_stateid;
seqid4 lock_seqid; seqid4 lock_seqid;
lock_owner4 lock_owner; lock_owner4 lock_owner;
}; };
This structure is used for the first LOCK operation done for an This structure is used for the first LOCK operation done for an
open_owner4. It provides both the open_stateid and lock_owner such open_owner4. It provides both the open_stateid and lock_owner such
that the transition is made from a valid open_stateid sequence to that the transition is made from a valid open_stateid sequence to
that of the new lock_stateid sequence. Using this mechanism avoids that of the new lock_stateid sequence. Using this mechanism avoids
the confirmation of the lock_owner/lock_seqid pair since it is tied the confirmation of the lock_owner/lock_seqid pair since it is tied
to established state in the form of the open_stateid/open_seqid. to established state in the form of the open_stateid/open_seqid.
3.2.13. stateid4 3.2.14. stateid4
struct stateid4 { struct stateid4 {
uint32_t seqid; uint32_t seqid;
opaque other[12]; opaque other[12];
}; };
This structure is used for the various state sharing mechanisms This structure is used for the various state sharing mechanisms
between the client and server. For the client, this data structure between the client and server. For the client, this data structure
is read-only. The starting value of the seqid field is undefined. is read-only. The starting value of the seqid field is undefined.
The server is required to increment the seqid field monotonically at The server is required to increment the seqid field monotonically at
each transition of the stateid. This is important since the client each transition of the stateid. This is important since the client
will inspect the seqid in OPEN stateids to determine the order of will inspect the seqid in OPEN stateids to determine the order of
OPEN processing done by the server. OPEN processing done by the server.
3.2.14. layouttype4 3.2.15. layouttype4
enum layouttype4 { enum layouttype4 {
LAYOUT4_NFSV4_1_FILES = 1, LAYOUT4_NFSV4_1_FILES = 1,
LAYOUT4_OSD2_OBJECTS = 2, LAYOUT4_OSD2_OBJECTS = 2,
LAYOUT4_BLOCK_VOLUME = 3 LAYOUT4_BLOCK_VOLUME = 3
}; };
A layout type specifies the layout being used. The implication is A layout type specifies the layout being used. The implication is
that clients have "layout drivers" that support one or more layout that clients have "layout drivers" that support one or more layout
types. The file server advertises the layout types it supports types. The file server advertises the layout types it supports
skipping to change at page 80, line 5 skipping to change at page 81, line 5
globally unique and are assigned according to the description in globally unique and are assigned according to the description in
Section 22.1; they are maintained by IANA. Types within the range Section 22.1; they are maintained by IANA. Types within the range
0x80000000-0xFFFFFFFF are site specific and for "private use" only. 0x80000000-0xFFFFFFFF are site specific and for "private use" only.
The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file The LAYOUT4_NFSV4_1_FILES enumeration specifies that the NFSv4.1 file
layout type is to be used. The LAYOUT4_OSD2_OBJECTS enumeration layout type is to be used. The LAYOUT4_OSD2_OBJECTS enumeration
specifies that the object layout, as defined in [29], is to be used. specifies that the object layout, as defined in [29], is to be used.
Similarly, the LAYOUT4_BLOCK_VOLUME enumeration that the block/volume Similarly, the LAYOUT4_BLOCK_VOLUME enumeration that the block/volume
layout, as defined in [30], is to be used. layout, as defined in [30], is to be used.
3.2.15. deviceid4 3.2.16. deviceid4
typedef uint32_t deviceid4; typedef uint64_t deviceid4;
Layout information includes device IDs that specify a storage device Layout information includes device IDs that specify a storage device
through a compact handle. Addressing and type information is through a compact handle. Addressing and type information is
obtained with the GETDEVICEINFO operation. A client must not assume obtained with the GETDEVICEINFO operation. A client must not assume
that device IDs are valid across metadata server reboots. The device that device IDs are valid across metadata server reboots. The device
ID is qualified by the layout type and are unique per file system ID is qualified by the layout type and are unique per file system
(FSID). See Section 13.2.10 for more details. (FSID). See Section 13.2.10 for more details.
3.2.16. device_addr4 3.2.17. device_addr4
struct device_addr4 { struct device_addr4 {
layouttype4 da_layout_type; layouttype4 da_layout_type;
opaque da_addr_body<>; opaque da_addr_body<>;
}; };
The device address is used to set up a communication channel with the The device address is used to set up a communication channel with the
storage device. Different layout types will require different types storage device. Different layout types will require different types
of structures to define how they communicate with storage devices. of structures to define how they communicate with storage devices.
The opaque da_addr_body field must be interpreted based on the The opaque da_addr_body field must be interpreted based on the
specified da_layout_type field. specified da_layout_type field.
This document defines the device address for the NFSv4.1 file layout This document defines the device address for the NFSv4.1 file layout
([[Comment.7: need xref]]), which identifies a storage device by ([[Comment.7: need xref]]), which identifies a storage device by
network IP address and port number. This is sufficient for the network IP address and port number. This is sufficient for the
clients to communicate with the NFSv4.1 storage devices, and may be clients to communicate with the NFSv4.1 storage devices, and may be
sufficient for other layout types as well. Device types for object sufficient for other layout types as well. Device types for object
storage devices and block storage devices (e.g., SCSI volume labels) storage devices and block storage devices (e.g., SCSI volume labels)
will be defined by their respective layout specifications. will be defined by their respective layout specifications.
3.2.17. devlist_item4 3.2.18. devlist_item4
struct devlist_item4 { struct devlist_item4 {
deviceid4 dli_id; deviceid4 dli_id;
device_addr4 dli_device_addr<>; device_addr4 dli_device_addr;
}; };
An array of these values is returned by the GETDEVICELIST operation. An array of these values is returned by the GETDEVICELIST operation.
They define the set of devices associated with a file system for the They define the set of devices associated with a file system for the
layout type specified in the GETDEVICELIST4args. layout type specified in the GETDEVICELIST4args.
3.2.18. layout_content4 3.2.19. layout_content4
struct layout_content4 { struct layout_content4 {
layouttype4 loc_type; layouttype4 loc_type;
opaque loc_body<>; opaque loc_body<>;
}; };
The loc_body field must be interpreted based on the layout type The loc_body field must be interpreted based on the layout type
(loc_type). This document defines the loc_body for the NFSv4.1 file (loc_type). This document defines the loc_body for the NFSv4.1 file
layout type is defined; see Section 14.3 for its definition. layout type is defined; see Section 14.3 for its definition.
3.2.19. layout4 3.2.20. layout4
struct layout4 { struct layout4 {
offset4 lo_offset; offset4 lo_offset;
length4 lo_length; length4 lo_length;
layoutiomode4 lo_iomode; layoutiomode4 lo_iomode;
layout_content4 lo_content; layout_content4 lo_content;
}; };
The layout4 structure defines a layout for a file. The layout type The layout4 structure defines a layout for a file. The layout type
specific data is opaque within lo_content. Since layouts are sub- specific data is opaque within lo_content. Since layouts are sub-
dividable, the offset and length together with the file's filehandle, dividable, the offset and length together with the file's filehandle,
the client ID, iomode, and layout type, identifies the layout. the client ID, iomode, and layout type, identifies the layout.
3.2.20. layoutupdate4 3.2.21. layoutupdate4
struct layoutupdate4 { struct layoutupdate4 {
layouttype4 lou_type; layouttype4 lou_type;
opaque lou_body<>; opaque lou_body<>;
}; };
The layoutupdate4 structure is used by the client to return 'updated' The layoutupdate4 structure is used by the client to return 'updated'
layout information to the metadata server at LAYOUTCOMMIT time. This layout information to the metadata server at LAYOUTCOMMIT time. This
structure provides a channel to pass layout type specific information structure provides a channel to pass layout type specific information
(in field lou_body) back to the metadata server. E.g., for block/ (in field lou_body) back to the metadata server. E.g., for block/
volume layout types this could include the list of reserved blocks volume layout types this could include the list of reserved blocks
that were written. The contents of the opaque lou_body argument are that were written. The contents of the opaque lou_body argument are
determined by the layout type and are defined in their context. The determined by the layout type and are defined in their context. The
NFSv4.1 file-based layout does not use this structure, thus the NFSv4.1 file-based layout does not use this structure, thus the
lou_body field should have a zero length. lou_body field should have a zero length.
3.2.21. layouthint4 3.2.22. layouthint4
struct layouthint4 { struct layouthint4 {
layouttype4 loh_type; layouttype4 loh_type;
opaque loh_body<>; opaque loh_body<>;
}; };
The layouthint4 structure is used by the client to pass in a hint The layouthint4 structure is used by the client to pass in a hint
about the type of layout it would like created for a particular file. about the type of layout it would like created for a particular file.
It is the structure specified by the layout_hint attribute described It is the structure specified by the layout_hint attribute described
in Section 5.13.4. The metadata server may ignore the hint, or may in Section 5.13.4. The metadata server may ignore the hint, or may
selectively ignore fields within the hint. This hint should be selectively ignore fields within the hint. This hint should be
provided at create time as part of the initial attributes within provided at create time as part of the initial attributes within
OPEN. The loh_body field is specific to the type of layout OPEN. The loh_body field is specific to the type of layout
(loh_type). The NFSv4.1 file-based layout uses the (loh_type). The NFSv4.1 file-based layout uses the
nfsv4_1_file_layouthint4 structure as defined in Section 14.3. nfsv4_1_file_layouthint4 structure as defined in Section 14.3.
3.2.22. layoutiomode4 3.2.23. layoutiomode4
enum layoutiomode4 { enum layoutiomode4 {
LAYOUTIOMODE4_READ = 1, LAYOUTIOMODE4_READ = 1,
LAYOUTIOMODE4_RW = 2, LAYOUTIOMODE4_RW = 2,
LAYOUTIOMODE4_ANY = 3 LAYOUTIOMODE4_ANY = 3
}; };
The iomode specifies whether the client intends to read or write The iomode specifies whether the client intends to read or write
(with the possibility of reading) the data represented by the layout. (with the possibility of reading) the data represented by the layout.
The ANY iomode MUST NOT be used for LAYOUTGET, however, it can be The ANY iomode MUST NOT be used for LAYOUTGET, however, it can be
used for LAYOUTRETURN and LAYOUTRECALL. The ANY iomode specifies used for LAYOUTRETURN and LAYOUTRECALL. The ANY iomode specifies
that layouts pertaining to both READ and RW iomodes are being that layouts pertaining to both READ and RW iomodes are being
returned or recalled, respectively. The metadata server's use of the returned or recalled, respectively. The metadata server's use of the
iomode may depend on the layout type being used. The storage devices iomode may depend on the layout type being used. The storage devices
may validate I/O accesses against the iomode and reject invalid may validate I/O accesses against the iomode and reject invalid
accesses. accesses.
3.2.23. nfs_impl_id4 3.2.24. nfs_impl_id4
struct nfs_impl_id4 { struct nfs_impl_id4 {
utf8str_cis nii_domain; utf8str_cis nii_domain;
utf8str_cs nii_name; utf8str_cs nii_name;
nfstime4 nii_date; nfstime4 nii_date;
}; };
This structure is used to identify client and server implementation This structure is used to identify client and server implementation
detail. The nii_domain field is the DNS domain name that the detail. The nii_domain field is the DNS domain name that the
implementer is associated with. The nii_name field is the product implementer is associated with. The nii_name field is the product
name of the implementation and is completely free form. It is name of the implementation and is completely free form. It is
recommended that the nii_name be used to distinguish machine recommended that the nii_name be used to distinguish machine
architecture, machine platforms, revisions, versions, and patch architecture, machine platforms, revisions, versions, and patch
levels. The nii_date field is the timestamp of when the software levels. The nii_date field is the timestamp of when the software
instance was published or built. instance was published or built.
3.2.24. threshold_item4 3.2.25. threshold_item4
struct threshold_item4 { struct threshold_item4 {
layouttype4 thi_layout_type; layouttype4 thi_layout_type;
bitmap4 thi_hintset; bitmap4 thi_hintset;
opaque thi_hintlist<>; opaque thi_hintlist<>;
}; };
This structure contains a list of hints specific to a layout type for This structure contains a list of hints specific to a layout type for
helping the client determine when it should issue I/O directly helping the client determine when it should issue I/O directly
through the metadata server vs. the data servers. The hint structure through the metadata server vs. the data servers. The hint structure
skipping to change at page 83, line 45 skipping to change at page 84, line 45
| threshold4_read_iosize | 2 | length4 | For read I/O sizes below | | threshold4_read_iosize | 2 | length4 | For read I/O sizes below |
| | | | this threshold it is | | | | | this threshold it is |
| | | | recommended to read data | | | | | recommended to read data |
| | | | through the MDS | | | | | through the MDS |
| threshold4_write_iosize | 3 | length4 | For write I/O sizes below | | threshold4_write_iosize | 3 | length4 | For write I/O sizes below |
| | | | this threshold it is | | | | | this threshold it is |
| | | | recommended to write data | | | | | recommended to write data |
| | | | through the MDS | | | | | through the MDS |
+-------------------------+---+---------+---------------------------+ +-------------------------+---+---------+---------------------------+
3.2.25. mdsthreshold4 3.2.26. mdsthreshold4
struct mdsthreshold4 { struct mdsthreshold4 {
threshold_item4 mth_hints<>; threshold_item4 mth_hints<>;
}; };
This structure holds an array of threshold_item4 structures each of This structure holds an array of threshold_item4 structures each of
which is valid for a particular layout type. An array is necessary which is valid for a particular layout type. An array is necessary
since a server can support multiple layout types for a single file. since a server can support multiple layout types for a single file.
4. Filehandles 4. Filehandles
skipping to change at page 92, line 27 skipping to change at page 93, line 27
lease_time lease_time
o The per file system attributes are: o The per file system attributes are:
supp_attr, fh_expire_type, link_support, symlink_support, supp_attr, fh_expire_type, link_support, symlink_support,
unique_handles, aclsupport, cansettime, case_insensitive, unique_handles, aclsupport, cansettime, case_insensitive,
case_preserving, chown_restricted, files_avail, files_free, case_preserving, chown_restricted, files_avail, files_free,
files_total, fs_locations, homogeneous, maxfilesize, maxname, files_total, fs_locations, homogeneous, maxfilesize, maxname,
maxread, maxwrite, no_trunc, space_avail, space_free, maxread, maxwrite, no_trunc, space_avail, space_free,
space_total, time_delta, fs_status, fs_layout_type, space_total, time_delta, change_policy, fs_status,
fs_locations_info fs_layout_type, fs_locations_info
o The per file system object attributes are: o The per file system object attributes are:
type, change, size, named_attr, fsid, rdattr_error, filehandle, type, change, size, named_attr, fsid, rdattr_error, filehandle,
ACL, archive, fileid, hidden, maxlink, mimetype, mode, ACL, archive, fileid, hidden, maxlink, mimetype, mode,
numlinks, owner, owner_group, rawdev, space_used, system, numlinks, owner, owner_group, rawdev, space_used, system,
time_access, time_backup, time_create, time_metadata, time_access, time_backup, time_create, time_metadata,
time_modify, mounted_on_fileid, dir_notif_delay, time_modify, mounted_on_fileid, dir_notif_delay,
dirent_notif_delay, dacl, sacl, layout_type, layout_hint, dirent_notif_delay, dacl, sacl, layout_type, layout_hint,
layout_blksize, layout_alignment, mdsthreshold, retention_get, layout_blksize, layout_alignment, mdsthreshold, retention_get,
skipping to change at page 95, line 41 skipping to change at page 97, line 4
| | | | | specified in a | | | | | | specified in a |
| | | | | SETATTR | | | | | | SETATTR |
| | | | | operation. | | | | | | operation. |
| case_insensitive | 16 | bool | READ | True, if | | case_insensitive | 16 | bool | READ | True, if |
| | | | | filename | | | | | | filename |
| | | | | comparisons on | | | | | | comparisons on |
| | | | | this file | | | | | | this file |
| | | | | system are | | | | | | system are |
| | | | | case | | | | | | case |
| | | | | insensitive. | | | | | | insensitive. |
| change_policy | 60 | chg_policy4 | READ | A value |
| | | | | created by the |
| | | | | server that |
| | | | | the client can |
| | | | | use to |
| | | | | determine if |
| | | | | some server |
| | | | | policy related |
| | | | | to the current |
| | | | | filesystem has |
| | | | | been subject |
| | | | | to change. If |
| | | | | the value |
| | | | | remains the |
| | | | | same then the |
| | | | | client can be |
| | | | | sure that the |
| | | | | values of the |
| | | | | attributes |
| | | | | related to fs |
| | | | | location and |
| | | | | the |
| | | | | fsstat_type |
| | | | | field of the |
| | | | | fs_status |
| | | | | attribute have |
| | | | | not changed. |
| | | | | See |
| | | | | Section 3.2.6 |
| | | | | for details. |
| case_preserving | 17 | bool | READ | True, if | | case_preserving | 17 | bool | READ | True, if |
| | | | | filename case | | | | | | filename case |
| | | | | on this file | | | | | | on this file |
| | | | | system are | | | | | | system are |
| | | | | preserved. | | | | | | preserved. |
| chown_restricted | 18 | bool | READ | If TRUE, the | | chown_restricted | 18 | bool | READ | If TRUE, the |
| | | | | server will | | | | | | server will |
| | | | | reject any | | | | | | reject any |
| | | | | request to | | | | | | request to |
| | | | | change either | | | | | | change either |
skipping to change at page 96, line 35 skipping to change at page 98, line 35
| dacl | 58 | nfsacl41 | R/W | Access Control | | dacl | 58 | nfsacl41 | R/W | Access Control |
| | | | | List used for | | | | | | List used for |
| | | | | determining | | | | | | determining |
| | | | | access to file | | | | | | access to file |
| | | | | system | | | | | | system |
| | | | | objects. | | | | | | objects. |
| dir_notif_delay | 56 | nfstime4 | READ | notification | | dir_notif_delay | 56 | nfstime4 | READ | notification |
| | | | | delays on | | | | | | delays on |
| | | | | directory | | | | | | directory |
| | | | | attributes | | | | | | attributes |
| dirent_ | 57 | nfstime4 | READ | notification | | dirent_notif_dela | 57 | nfstime4 | READ | notification |
| notif_delay | | | | delays on | | y | | | | delays on |
| | | | | child | | | | | | child |
| | | | | attributes | | | | | | attributes |
| fileid | 20 | uint64 | READ | A number | | fileid | 20 | uint64 | READ | A number |
| | | | | uniquely | | | | | | uniquely |
| | | | | identifying | | | | | | identifying |
| | | | | the file | | | | | | the file |
| | | | | within the | | | | | | within the |
| | | | | file system. | | | | | | file system. |
| files_avail | 21 | uint64 | READ | File slots | | files_avail | 21 | uint64 | READ | File slots |
| | | | | available to | | | | | | available to |
skipping to change at page 97, line 29 skipping to change at page 99, line 29
| | | | | this object - | | | | | | this object - |
| | | | | this should be | | | | | | this should be |
| | | | | the smallest | | | | | | the smallest |
| | | | | relevant | | | | | | relevant |
| | | | | limit. | | | | | | limit. |
| files_total | 23 | uint64 | READ | Total file | | files_total | 23 | uint64 | READ | Total file |
| | | | | slots on the | | | | | | slots on the |
| | | | | file system | | | | | | file system |
| | | | | containing | | | | | | containing |
| | | | | this object. | | | | | | this object. |
| fs_absent | 60 | bool | READ | Is current |
| | | | | file system |
| | | | | present or |
| | | | | absent. |
| fs_layout_type | 62 | layouttype4<> | READ | Layout types | | fs_layout_type | 62 | layouttype4<> | READ | Layout types |
| | | | | available for | | | | | | available for |
| | | | | the file | | | | | | the file |
| | | | | system. | | | | | | system. |
| fs_locations | 24 | fs_locations | READ | Locations | | fs_locations | 24 | fs_locations | READ | Locations |
| | | | | where this | | | | | | where this |
| | | | | file system | | | | | | file system |
| | | | | may be found. | | | | | | may be found. |
| | | | | If the server | | | | | | If the server |
| | | | | returns | | | | | | returns |
skipping to change at page 109, line 38 skipping to change at page 111, line 38
The dirent_notif_delay attribute is the minimum number of seconds the The dirent_notif_delay attribute is the minimum number of seconds the
server will delay before notifying the client of a change to a file server will delay before notifying the client of a change to a file
object that has an entry in the directory. object that has an entry in the directory.
5.13. PNFS Attributes 5.13. PNFS Attributes
5.13.1. fs_layout_type 5.13.1. fs_layout_type
The fs_layout_type attribute (data type layouttype4, see The fs_layout_type attribute (data type layouttype4, see
Section 3.2.14) applies to a file system and indicates what layout Section 3.2.15) applies to a file system and indicates what layout
types are supported by the file system. This attribute is expected types are supported by the file system. This attribute is expected
be queried when a client encounters a new fsid. This attribute is be queried when a client encounters a new fsid. This attribute is
used by the client to determine if it supports the layout type. used by the client to determine if it supports the layout type.
5.13.2. layout_alignment 5.13.2. layout_alignment
The layout_alignment attribute indicates the preferred alignment for The layout_alignment attribute indicates the preferred alignment for
I/O to files on the file system the client has layouts for. Where I/O to files on the file system the client has layouts for. Where
possible, the client should issue READ and WRITE operations with possible, the client should issue READ and WRITE operations with
offsets are whole multiples of the layout_alignment attribute. offsets are whole multiples of the layout_alignment attribute.
skipping to change at page 110, line 16 skipping to change at page 112, line 16
The layout_blksize attribute indicates the preferred block size for The layout_blksize attribute indicates the preferred block size for
I/O to files on the file system the client has layouts for. Where I/O to files on the file system the client has layouts for. Where
possible, the client should issue READ operations with a count possible, the client should issue READ operations with a count
argument that is a whole multiple of layout_blksize, and WRITE argument that is a whole multiple of layout_blksize, and WRITE
operations with a data argument of size that is a whole multiple of operations with a data argument of size that is a whole multiple of
layout_blksize. layout_blksize.
5.13.4. layout_hint 5.13.4. layout_hint
The layout_hint attribute (data type layouthint4, see Section 3.2.21) The layout_hint attribute (data type layouthint4, see Section 3.2.22)
may be set on newly created files to influence the metadata server's may be set on newly created files to influence the metadata server's
choice for the file's layout. It is suggested that this attribute is choice for the file's layout. It is suggested that this attribute is
set as one of the initial attributes within the OPEN call. The set as one of the initial attributes within the OPEN call. The
metadata server may ignore this attribute. This attribute is a sub- metadata server may ignore this attribute. This attribute is a sub-
set of the layout structure returned by LAYOUTGET. For example, set of the layout structure returned by LAYOUTGET. For example,
instead of specifying particular devices, this would be used to instead of specifying particular devices, this would be used to
suggest the stripe width of a file. It is up to the server suggest the stripe width of a file. It is up to the server
implementation to determine which fields within the layout it uses. implementation to determine which fields within the layout it uses.
5.13.5. layout_type 5.13.5. layout_type
skipping to change at page 138, line 29 skipping to change at page 140, line 29
7.1. Server Exports 7.1. Server Exports
On a UNIX server, the namespace describes all the files reachable by On a UNIX server, the namespace describes all the files reachable by
pathnames under the root directory or "/". On a Windows NT server pathnames under the root directory or "/". On a Windows NT server
the namespace constitutes all the files on disks named by mapped disk the namespace constitutes all the files on disks named by mapped disk
letters. NFS server administrators rarely make the entire server's letters. NFS server administrators rarely make the entire server's
file system namespace available to NFS clients. More often portions file system namespace available to NFS clients. More often portions
of the namespace are made available via an "export" feature. In of the namespace are made available via an "export" feature. In
previous versions of the NFS protocol, the root filehandle for each previous versions of the NFS protocol, the root filehandle for each
export is obtained through the MOUNT protocol; the client sends a export is obtained through the MOUNT protocol; the client sent a
string that identifies the export of namespace and the server returns string that identified the export name within the namespace and the
the root filehandle for it. The MOUNT protocol supports an EXPORTS server returned the root filehandle for that export. The MOUNT
procedure that will enumerate the server's exports. protocol also provided an EXPORTS procedure that enumerated server's
exports.
7.2. Browsing Exports 7.2. Browsing Exports
The NFS version 4 protocol provides a root filehandle that clients The NFS version 4 protocol provides a root filehandle that clients
can use to obtain filehandles for the exports of a particular server, can use to obtain filehandles for the exports of a particular server,
via a series of LOOKUP operations within a COMPOUND, to traverse a via a series of LOOKUP operations within a COMPOUND, to traverse a
path. A common user experience is to use a graphical user interface path. A common user experience is to use a graphical user interface
(perhaps a file "Open" dialog window) to find a file via progressive (perhaps a file "Open" dialog window) to find a file via progressive
browsing through a directory tree. The client must be able to move browsing through a directory tree. The client must be able to move
from one export to another export via single-component, progressive from one export to another export via single-component, progressive
LOOKUP operations. LOOKUP operations.
This style of browsing is not well supported by the NFS version 2 and This style of browsing is not well supported by the NFS version 2 and
3 protocols. The client expects all LOOKUP operations to remain 3 protocols. In these versions of NFS, the client expects all LOOKUP
within a single server file system. For example, the device operations to remain within a single server file system. For
attribute will not change. This prevents a client from taking example, the device attribute will not change. This prevents a
namespace paths that span exports. client from taking namespace paths that span exports.
In the case of Veriosn 2 and 3, an automounter on the client can In the case of Versions 2 and 3, an automounter on the client can
obtain a snapshot of the server's namespace using the EXPORTS obtain a snapshot of the server's namespace using the EXPORTS
procedure of the MOUNT protocol. If it understands the server's procedure of the MOUNT protocol. If it understands the server's
pathname syntax, it can create an image of the server's namespace on pathname syntax, it can create an image of the server's namespace on
the client. The parts of the namespace that are not exported by the the client. The parts of the namespace that are not exported by the
server are filled in with directories that might be arrange similarly server are filled in with directories that might be arrange similarly
to a version 4 "pseudo file system" that allows the user to browse to a version 4 "pseudo file system" that allows the user to browse
from one mounted file system to another. There is a drawback to this from one mounted file system to another. There is a drawback to this
representation of the server's namespace on the client: it is static. representation of the server's namespace on the client: it is static.
If the server administrator adds a new export the client will be If the server administrator adds a new export the client will be
unaware of it. unaware of it.
skipping to change at page 139, line 31 skipping to change at page 141, line 31
a single namespace, for that server. An NFS version 4 client uses a single namespace, for that server. An NFS version 4 client uses
LOOKUP and READDIR operations to browse seamlessly from one export to LOOKUP and READDIR operations to browse seamlessly from one export to
another. another.
Where there are portions of the server namespace that are not Where there are portions of the server namespace that are not
exported, clients require some way of traversing those portions to exported, clients require some way of traversing those portions to
reach actual exported file systems. A technique that servers may use reach actual exported file systems. A technique that servers may use
to provide for this is to bridge unexported portion of the namespace to provide for this is to bridge unexported portion of the namespace
via a "pseudo file system" that provides a view of exported via a "pseudo file system" that provides a view of exported
directories only. A pseudo file system has a unique fsid and behaves directories only. A pseudo file system has a unique fsid and behaves
like a normal, read only file system. like a normal, read-only file system.
Based on the construction of the server's namespace, it is possible Based on the construction of the server's namespace, it is possible
that multiple pseudo file systems may exist. For example, that multiple pseudo file systems may exist. For example,
/a pseudo file system /a pseudo file system
/a/b real file system /a/b real file system
/a/b/c pseudo file system /a/b/c pseudo file system
/a/b/c/d real file system /a/b/c/d real file system
Each of the pseudo file systems are considered separate entities and Each of the pseudo file systems is considered a separate entity and
therefore MUST have its own unique fsid. therefore MUST have its own fsid, unique among all the fsids for that
server.
7.4. Multiple Roots 7.4. Multiple Roots
Certain operating environments are sometimes described as having Certain operating environments are sometimes described as having
"multiple roots". In such environments individual file systems are "multiple roots". In such environments individual file systems are
commonly represented by disk or volume names. NFS version 4 servers commonly represented by disk or volume names. NFS version 4 servers
for these platforms can construct a pseudo file system above these for these platforms can construct a pseudo file system above these
root names so that disk letters or volume names are simply directory root names so that disk letters or volume names are simply directory
names in the pseudo root. names in the pseudo root.
skipping to change at page 140, line 22 skipping to change at page 142, line 22
which persistent filehandles could be constructed. Even though it is which persistent filehandles could be constructed. Even though it is
preferable that the server provide persistent filehandles for the preferable that the server provide persistent filehandles for the
pseudo file system, the NFS client should expect that pseudo file pseudo file system, the NFS client should expect that pseudo file
system filehandles are volatile. This can be confirmed by checking system filehandles are volatile. This can be confirmed by checking
the associated "fh_expire_type" attribute for those filehandles in the associated "fh_expire_type" attribute for those filehandles in
question. If the filehandles are volatile, the NFS client must be question. If the filehandles are volatile, the NFS client must be
prepared to recover a filehandle value (e.g. with a series of LOOKUP prepared to recover a filehandle value (e.g. with a series of LOOKUP
operations) when receiving an error of NFS4ERR_FHEXPIRED. operations) when receiving an error of NFS4ERR_FHEXPIRED.
Because it is quite likely that servers will implement pseudo file Because it is quite likely that servers will implement pseudo file
systems using volative filehandles, clients need to be prepared for systems using volatile filehandles, clients need to be prepared for
them, rather than assuming that all filehandles will be persistent. them, rather than assuming that all filehandles will be persistent.
7.6. Exported Root 7.6. Exported Root
If the server's root file system is exported, one might conclude that If the server's root file system is exported, one might conclude that
a pseudo-file system is unneeded. This not necessarily so. Assume a pseudo file system is unneeded. This not necessarily so. Assume
the following file systems on a server: the following file systems on a server:
/ fs1 (exported) / fs1 (exported)
/a fs2 (not exported) /a fs2 (not exported)
/a/b fs3 (exported) /a/b fs3 (exported)
Because fs2 is not exported, fs3 cannot be reached with simple Because fs2 is not exported, fs3 cannot be reached with simple
LOOKUPs. The server must bridge the gap with a pseudo-file system. LOOKUPs. The server must bridge the gap with a pseudo file system.
7.7. Mount Point Crossing 7.7. Mount Point Crossing
The server file system environment may be constructed in such a way The server file system environment may be constructed in such a way
that one file system contains a directory which is 'covered' or that one file system contains a directory which is 'covered' or
mounted upon by a second file system. For example: mounted upon by a second file system. For example:
/a/b (file system 1) /a/b (file system 1)
/a/b/c/d (file system 2) /a/b/c/d (file system 2)
The pseudo file system for this server may be constructed to look The pseudo file system for this server may be constructed to look
like: like:
/ (place holder/not exported) / (place holder/not exported)
/a/b (file system 1) /a/b (file system 1)
/a/b/c/d (file system 2) /a/b/c/d (file system 2)
It is the server's responsibility to present the pseudo file system It is the server's responsibility to present the pseudo file system
that is complete to the client. If the client sends a lookup request that is complete to the client. If the client sends a lookup request
for the path "/a/b/c/d", the server's response is the filehandle of for the path "/a/b/c/d", the server's response is the filehandle of
the file system "/a/b/c/d". In previous versions of the NFS the root of the file system "/a/b/c/d". In previous versions of the
protocol, the server would respond with the filehandle of directory NFS protocol, the server would respond with the filehandle of
"/a/b/c/d" within the file system "/a/b". directory "/a/b/c/d" within the file system "/a/b".
The NFS client will be able to determine if it crosses a server mount The NFS client will be able to determine if it crosses a server mount
point by a change in the value of the "fsid" attribute. point by a change in the value of the "fsid" attribute.
7.8. Security Policy and Namespace Presentation 7.8. Security Policy and Namespace Presentation
Because NFSv4 clients possess the ability to change the security Because NFSv4 clients possess the ability to change the security
mechanisms used, after determining what is allowed, by using SECINFO mechanisms used, after determining what is allowed, by using SECINFO
and SECINFO_NONAME, the server SHOULD NOT present a different view of and SECINFO_NONAME, the server SHOULD NOT present a different view of
the namespace based on the security mechanism being used by a client. the namespace based on the security mechanism being used by a client.
Instead, it should present a consistent view and return Instead, it should present a consistent view and return
NFS4ERR_WRONGSEC if an attempt is made to access data with an NFS4ERR_WRONGSEC if an attempt is made to access data with an
inappropriate security mechanism. inappropriate security mechanism.
If security considerations make it necessary to hide the existence of If security considerations make it necessary to hide the existence of
a particular file system, as opposed to all of the data within it, a particular file system, as opposed to all of the data within it,
the server can apply the security policy of a shared resource in the the server can apply the security policy of a shared resource in the
server's namespace to components of the resource's ancestors. For server's namespace to components of the resource's ancestors. For
example: example:
/ / (place holder/not exported)
/a/b /a/b (file system 1)
/a/b/MySecretProject /a/b/MySecretProject (file system 2)
The /a/b/MySecretProject directory is a real file system and is the The /a/b/MySecretProject directory is a real file system and is the
shared resource. Suppose the security policy for /a/b/ shared resource. Suppose the security policy for /a/b/
MySecretProject is Kerberos with integrity and it desired to prevent MySecretProject is Kerberos with integrity and it desired to prevent
knowledge of the existence of this file system to be very limited. knowledge of the existence of this file system to be very limited.
In this case the server should apply the same security policy to In this case the server should apply the same security policy to
/a/b. This allows for knowledge the existence of a filesystem to be /a/b. This allows for knowledge the existence of a filesystem to be
secured in cases where this is desirable. secured in cases where this is desirable.
For the case of the use of multiple, disjoint security mechanisms in For the case of the use of multiple, disjoint security mechanisms in
the server's resources, the security for a particular object in the the server's resources, applying that sort of policy would result in
server's namespace should be the union of all security mechanisms of the higher-level file system not being accessible using any security
all direct descendants. A common and convenient practice, unless flavor, which would make the that higher-level file system
strong security requirements dictate otherwise, is to make all of the inaccessible. Therefore, that sort of configuration is not
pseudo file system accessible by all of the valid security compatible with hiding the existence (as opposed to the contents)
mechanisms. from clients using multiple disjoint sets of security flavors.
In other circumstances, a desirable policy is for the security of a
particular object in the server's namespace should include the union
of all security mechanisms of all direct descendants. A common and
convenient practice, unless strong security requirements dictate
otherwise, is to make all of the pseudo file system accessible by all
of the valid security mechanisms.
Where there is concern about the security of data on the wire, Where there is concern about the security of data on the wire,
clients should use strong security mechanisms to access the pseudo clients should use strong security mechanisms to access the pseudo
file system in order to prevent man-in-the-middle-attacks from file system in order to prevent man-in-the-middle-attacks from
directing LOOKUP's within the pseudo-fs from compromising the directing LOOKUPs within the pseudo file system from compromising the
existence of sensitive data, or getting access to data that the existence of sensitive data, or getting access to data that the
client is sending by directing the client to send it using weak client is sending by directing the client to send it using weak
security mechanisms. security mechanisms.
8. State Management 8. State Management
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of such features as share reservations, stateful. With the inclusion of such features as share reservations,
file and directory delegations, recallable layouts, and support for file and directory delegations, recallable layouts, and support for
mandatory record locking the protocol becomes substantially more mandatory record locking the protocol becomes substantially more
skipping to change at page 144, line 37 skipping to change at page 146, line 46
Stateids are divided into two fields, a 96-bit "other" field Stateids are divided into two fields, a 96-bit "other" field
identifying the specific set of locks and a 32-bit "seqid" sequence identifying the specific set of locks and a 32-bit "seqid" sequence
value. Except in the case of special stateids, to be discussed value. Except in the case of special stateids, to be discussed
below, a particular value of the "other" field denotes a set of locks below, a particular value of the "other" field denotes a set of locks
of the same type (for example byte-range lock, opens, delegations, or of the same type (for example byte-range lock, opens, delegations, or
layouts), for a specific file or directory, and sharing the same layouts), for a specific file or directory, and sharing the same
ownership characteristics. The seqid designates a specific instance ownership characteristics. The seqid designates a specific instance
of such a set of locks, and is incremented to indicate changes in of such a set of locks, and is incremented to indicate changes in
such a set of locks, either by the addition or deletion of locks from such a set of locks, either by the addition or deletion of locks from
the, a change in the byte-range they apply to, or an upgrade or the set, a change in the byte-range they apply to, or an upgrade or
downgrade in the type of one or more locks. downgrade in the type of one or more locks.
When such a set of locks is first created the server returns a When such a set of locks is first created the server returns a
stateid with seqid value of one. On subsequent operations which stateid with seqid value of one. On subsequent operations which
modify the set of locks the server is required to increment the seqid modify the set of locks the server is required to increment the seqid
field by one (1) whenever it returns a stateid for the same state field by one (1) whenever it returns a stateid for the same state
owner/file/type combination and there is some change in the set of owner/file/type combination and there is some change in the set of
locks actually designated. In this case the server will return a locks actually designated. In this case the server will return a
stateid with an other field the same as previously used for that stateid with an other field the same as previously used for that
state owner/file/type combination, with an incremented seqid field. state owner/file/type combination, with an incremented seqid field.
The purpose of the incrementing of the seqid is to allow the replier The purpose of the incrementing of the seqid is to allow the replier
to communicate to the requester the order in which operations that to communicate to the requester the order in which operations that
modified locking state associated with a stateid have been processed modified locking state associated with a stateid have been processed
and to make it possible for the client to issue requests that are and to make it possible for the client to issue requests that are
conditional on the set of locks not having changed since the stateid conditional on the set of locks not having changed since the stateid
in question was returned. in question was returned.
When stateids are sent to the server by the client, it has two When a client sends a stateid to the server, it has two choices with
choices with regard to the seqid sent. It may set the seqid to zero regard to the seqid sent. It may set the seqid to zero to indicate
to indicate to the server that it wishes the most up-to-date seqid to the server that it wishes the most up-to-date seqid for that
for that stateid's "other" field to be used. This would be the stateid's "other" field to be used. This would be the common choice
common choice in the case of stateid sent with a READ or WRITE in the case of a stateid sent with a READ or WRITE operation. It
operation. It also may set a non-zero value in which case the server also may set a non-zero value in which case the server checks if that
checks if that seqid is the correct one. In that case the server is seqid is the correct one. In that case the server is required to
required to return NFS4ERR_OLD_STATEID if the seqid is lower than the return NFS4ERR_OLD_STATEID if the seqid is lower than the most
most current value and NFS4ERR_BAD_STATEID if the seqid is greater current value and NFS4ERR_BAD_STATEID if the seqid is greater than
than the most current value. This would be the common choice in the the most current value. This would be the common choice in the case
case if stateids sent with a CLOSE or OPEN_DOWNGRADE. Because OPENs of stateids sent with a CLOSE or OPEN_DOWNGRADE. Because OPENs may
may be sent in parallel for the same owner, a client might close a be sent in parallel for the same owner, a client might close a file
file without knowing that an OPEN upgrade had been done by the without knowing that an OPEN upgrade had been done by the server,
server, changing the lock in question. If CLOSE were sent with a changing the lock in question. If CLOSE were sent with a zero seqid,
zero seqid, the OPEN upgrade would be canceled before the client even the OPEN upgrade would be canceled before the client even received an
received an indication that it had happened. indication that an upgrade had happened.
8.2.3. Special Stateids 8.2.3. Special Stateids
Stateid values whose "other" field is either all zeros or all ones Stateid values whose "other" field is either all zeros or all ones
are reserved. They may not be assigned by the server but have are reserved. They may not be assigned by the server but have
special meanings defined by the protocol. The particular meaning special meanings defined by the protocol. The particular meaning
depends on whether the "other" field is all zeros or all ones and the depends on whether the "other" field is all zeros or all ones and the
specific value of the "seqid" field. specific value of the "seqid" field.
The following combinations of "other" and "seqid" are defined in The following combinations of "other" and "seqid" are defined in
skipping to change at page 147, line 37 skipping to change at page 149, line 44
o If the server has restarted resulting in loss of all leased state o If the server has restarted resulting in loss of all leased state
but the sessionid and client Id are still valid, return but the sessionid and client Id are still valid, return
NFS4ERR_STALE_STATEID. (If server restart has resulted in an NFS4ERR_STALE_STATEID. (If server restart has resulted in an
invalid client ID or sessionid is invalid, SEQUENCE will return an invalid client ID or sessionid is invalid, SEQUENCE will return an
error - not NFS4ERR_STALE_STATEID - and the operation that takes a error - not NFS4ERR_STALE_STATEID - and the operation that takes a
stateid as an argument will never be processed.) stateid as an argument will never be processed.)
o If the "other" field is all zeros or all ones, check that the o If the "other" field is all zeros or all ones, check that the
"other" and "seqid" match a defined combination for a special "other" and "seqid" match a defined combination for a special
stateid and than that stateid can be used in the current context. stateid and then that stateid can be used in the current context.
If not, then return NFS4ERR_BAD_STATEID. If not, then return NFS4ERR_BAD_STATEID.
o If the "seqid" field is not zero, and it is greater than the o If the "seqid" field is not zero, and it is greater than the
current sequence value corresponding the current "other" field, current sequence value corresponding the current "other" field,
return NFS4ERR_BAD_STATEID. return NFS4ERR_BAD_STATEID.
o If the "seqid" field is not zero, and it is less than the current o If the "seqid" field is not zero, and it is less than the current
sequence value corresponding the current "other" field, return sequence value corresponding the current "other" field, return
NFS4ERR_OLD_STATEID. NFS4ERR_OLD_STATEID.
skipping to change at page 148, line 50 skipping to change at page 151, line 11
otherwise unreachable. It is not a mechanism for cache consistency otherwise unreachable. It is not a mechanism for cache consistency
and lease renewals may not be denied if the lease interval has not and lease renewals may not be denied if the lease interval has not
expired. expired.
Since each session is associated with a specific client, any Since each session is associated with a specific client, any
operation issued on that session is an indication that the associated operation issued on that session is an indication that the associated
client is reachable. When a request is issued for a given session, client is reachable. When a request is issued for a given session,
successful execution of a SEQUENCE operation (or successful retrieval successful execution of a SEQUENCE operation (or successful retrieval
of the result of SEQUENCE from the reply cache) will result in all of the result of SEQUENCE from the reply cache) will result in all
leases for the associated client to be implicitly renewed. In leases for the associated client to be implicitly renewed. In
addition, whenever a new stateid is created ot updated (i.e. returned addition, whenever a new stateid is created or updated (i.e. returned
with a new seqid value), all leases for the associate client are also with a new seqid value), all leases for the associate client are also
renewed. This approach allows for low overhead lease renewal which renewed. This approach allows for low overhead lease renewal which
scales well. In the typical case no extra RPC calls are required for scales well. In the typical case no extra RPC calls are required for
lease renewal and in the worst case one RPC is required every lease lease renewal and in the worst case one RPC is required every lease
period, via a COMPOUND that consists solely of a single SEQUENCE period, via a COMPOUND that consists solely of a single SEQUENCE
operation. The number of locks held by the client is not a factor operation. The number of locks held by the client is not a factor
since all state for the client is involved with the lease renewal since all state for the client is involved with the lease renewal
action. action.
Since all operations that create a new lease also renew existing Since all operations that create a new lease also renew existing
leases, the server must maintain a common lease expiration time for leases, the server must maintain a common lease expiration time for
all valid leases for a given client. This lease time can then be all valid leases for a given client. This lease time can then be
easily updated upon implicit lease renewal actions. easily updated upon implicit lease renewal actions.
8.4. Crash Recovery 8.4. Crash Recovery
The important requirement in crash recovery is that both the client A critical requirement in crash recovery is that both the client and
and the server know when the other has failed. Additionally, it is the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts or reboots. All READ and WRITE operations that may have restarts or reboots. All READ and WRITE operations that may have
been queued within the client or network buffers must wait until the been queued within the client or network buffers must wait until the
client has successfully recovered the locks protecting the READ and client has successfully recovered the locks protecting the READ and
WRITE operations. Any that reach the server before it can safely WRITE operations. Any that reach the server before the server can
determine that it has re-established enough locking state to be sure safely determine that the client has recovered enough locking state
that such requests can be safely processed must be rejected, either to be sure that such operations can be safely processed must be
because the state presented is no longer valid rejected, either because the state presented is no longer valid
(NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID) or because (NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID) or because
subsequent recovery of locks may make execution of the operation subsequent recovery of locks may make execution of the operation
inappropriate (NFS4ERR_GRACE). inappropriate (NFS4ERR_GRACE).
8.4.1. Client Failure and Recovery 8.4.1. Client Failure and Recovery
In the event that a client fails, the server may release the client's In the event that a client fails, the server may release the client's
locks when the associated leases have expired. Conflicting locks locks when the associated leases have expired. Conflicting locks
from another client may only be granted after this lease expiration. from another client may only be granted after this lease expiration.
When a client has not failed and re-establishes his lease before When a client has not failed and re-establishes his lease before
skipping to change at page 151, line 29 skipping to change at page 153, line 38
example, CREATE_SESSION, DESTROY_SESSION) returns example, CREATE_SESSION, DESTROY_SESSION) returns
NFS4ERR_STALE_CLIENTID. The client MUST establish a new client NFS4ERR_STALE_CLIENTID. The client MUST establish a new client
ID (Section 8.1) and re-establish its lock state ID (Section 8.1) and re-establish its lock state
(Section 8.4.2.1). (Section 8.4.2.1).
8.4.2.1. State Reclaim 8.4.2.1. State Reclaim
When state information and the associated locks are lost as a result When state information and the associated locks are lost as a result
of a server reboot, the protocol must provide a way to cause that of a server reboot, the protocol must provide a way to cause that
state to be re-established. The approach used is to define, for most state to be re-established. The approach used is to define, for most
type of locking state (layouts are an exception, a request whose type of locking state (layouts are an exception), a request whose
function is to allow the client to re-establish on the server a lock function is to allow the client to re-establish on the server a lock
first gotten on a previous instance. Generally these requests are first obtained from a previous instance. Generally these requests
variants of the requests normally used to create locks of that type are variants of the requests normally used to create locks of that
and are referred to as "reclaim-type" requests and the process of re- type and are referred to as "reclaim-type" requests and the process
establishing such locks is referred to as "reclaiming" them. of re-establishing such locks is referred to as "reclaiming" them.
Because each client must have an opportunity to reclaim all of the Because each client must have an opportunity to reclaim all of the
locks that it has without the possibility that some other client will locks that it has without the possibility that some other client will
be granted a conflicting lock, a special period called the "grace be granted a conflicting lock, a special period called the "grace
period" is devoted to the reclaim process. During this period, only period" is devoted to the reclaim process. During this period, only
reclaim-type locking requests are allowed, unless the server is able reclaim-type locking requests are allowed, unless the server is able
to reliably determine (through state persistently maintained across to reliably determine (through state persistently maintained across
reboot instances), that granting any such lock cannot possibly reboot instances), that granting any such lock cannot possibly
conflict with a subsequent reclaim. When a request is made to obtain conflict with a subsequent reclaim. When a request is made to obtain
a new lock (i.e. not a reclaim-type request) during the grace period a new lock (i.e. not a reclaim-type request) during the grace period
skipping to change at page 152, line 4 skipping to change at page 154, line 13
reboot instances), that granting any such lock cannot possibly reboot instances), that granting any such lock cannot possibly
conflict with a subsequent reclaim. When a request is made to obtain conflict with a subsequent reclaim. When a request is made to obtain
a new lock (i.e. not a reclaim-type request) during the grace period a new lock (i.e. not a reclaim-type request) during the grace period
and such a determination cannot be made, the server must return the and such a determination cannot be made, the server must return the
error NFS4ERR_GRACE. error NFS4ERR_GRACE.
Once a session is established using the new client ID, the client Once a session is established using the new client ID, the client
will use reclaim-type locking requests (e.g. LOCK requests with will use reclaim-type locking requests (e.g. LOCK requests with
reclaim set to true and OPEN operations with a claim type of reclaim set to true and OPEN operations with a claim type of
CLAIM_PREVIOUS. See Section 9.8) to re-establish its locking state. CLAIM_PREVIOUS. See Section 9.8) to re-establish its locking state.
Once this is done, or if there is no such locking state to reclaim, Once this is done, or if there is no such locking state to reclaim,
the client does a global RECLAIM_COMPLETE operation, i.e. one with the client sends a global RECLAIM_COMPLETE operation, i.e. one with
the one_fs argument set to false, to indicate that it has reclaimed the one_fs argument set to false, to indicate that it has reclaimed
all of the locking state that it will reclaim. Once a client does all of the locking state that it will reclaim. Once a client sends
such a RECLAIM_COMPLETE operation, it may attempt non-reclaim locking such a RECLAIM_COMPLETE operation, it may attempt non-reclaim locking
operations, although it may get NFS4ERR_GRACE errors on these until operations, although it may get NFS4ERR_GRACE errors the operations
the period of special handling is over. See Section 11.6.7 for a until the period of special handling is over. See Section 11.6.7 for
discussion of the analogous handling lock reclamation in the case of a discussion of the analogous handling lock reclamation in the case
filesystems transitioning from server to server. of filesystems transitioning from server to server.
Note that if the client ID persisted through a server reboot (which Note that if the client ID persisted through a server reboot (which
will be self-evident if the client never received a will be self-evident if the client never received a
NFS4ERR_STALE_CLIENTID error, and instead got NFS4ERR_STALE_CLIENTID error, and instead got
SEQ4_STATUS_RESTART_RECLAIM_NEEDED status from SEQUENCE SEQ4_STATUS_RESTART_RECLAIM_NEEDED status from SEQUENCE
(Section 18.46.4), no client ID was re-established. See Paragraph 2 (Section 18.46.4), no client ID was re-established. See Paragraph 2
of Section 9.8 for discussion of some restrictions on use of upgrade of Section 9.8 for discussion of some restrictions on use of upgrade
semantics in connection with reclaim that are the result of some semantics in connection with reclaim that are the result of some
issues that apply to this situation. issues that apply to this situation.
skipping to change at page 168, line 52 skipping to change at page 171, line 11
When multiple open files on the client are merged into a single open When multiple open files on the client are merged into a single open
file object on the server, the close of one of the open files (on the file object on the server, the close of one of the open files (on the
client) may necessitate change of the access and deny status of the client) may necessitate change of the access and deny status of the
open file on the server. This is because the union of the access and open file on the server. This is because the union of the access and
deny bits for the remaining opens may be smaller (i.e. a proper deny bits for the remaining opens may be smaller (i.e. a proper
subset) than previously. The OPEN_DOWNGRADE operation is used to subset) than previously. The OPEN_DOWNGRADE operation is used to
make the necessary change and the client should use it to update the make the necessary change and the client should use it to update the
server so that share reservation requests by other clients are server so that share reservation requests by other clients are
handled properly. handled properly.
Because of the possibility that the client will issue multiple open Because of the possibility that the client will issue multiple opens
for the same owner in parallel, it may be the case that a open for the same owner in parallel, it may be the case that a open
upgrade ay happen without the client knowing beforehand that this upgrade may happen without the client knowing beforehand that this
could happen. Because of this possiblity, CLOSEs and could happen. Because of this possiblity, CLOSEs and
OPEN_DOWNGRADEs, should generally be issued with a non-zero seqid in OPEN_DOWNGRADEs, should generally be issued with a non-zero seqid in
the stateid, to avoid the possibility that the status change the stateid, to avoid the possibility that the status change
associated with an open upgrade is not inadvertanyly lost. associated with an open upgrade is not inadvertently lost.
9.8. Reclaim of Open and Byte-range Locks 9.8. Reclaim of Open and Byte-range Locks
Special forms of the LOCK and OPEN operations are provided when it is Special forms of the LOCK and OPEN operations are provided when it is
necessary to re-establish byte-range locks or opens after a server necessary to re-establish byte-range locks or opens after a server
failure. failure.
o To reclaim existing opens, an OPEN operation is performed using a o To reclaim existing opens, an OPEN operation is performed using a
CLAIM_PREVIOUS. Because the client, in this type of situation, CLAIM_PREVIOUS. Because the client, in this type of situation,
will have already opened the file and have the filehandle of the will have already opened the file and have the filehandle of the
skipping to change at page 186, line 47 skipping to change at page 189, line 27
delegation voluntarily. The following items of state need to be delegation voluntarily. The following items of state need to be
dealt with: dealt with:
o If the file associated with the delegation is no longer open and o If the file associated with the delegation is no longer open and
no previous CLOSE operation has been sent to the server, a CLOSE no previous CLOSE operation has been sent to the server, a CLOSE
operation must be sent to the server. operation must be sent to the server.
o If a file has other open references at the client, then OPEN o If a file has other open references at the client, then OPEN
operations must be sent to the server. The appropriate stateids operations must be sent to the server. The appropriate stateids
will be provided by the server for subsequent use by the client will be provided by the server for subsequent use by the client
since the delegation stateid will not longer be valid. These OPEN since the delegation stateid will no longer be valid. These OPEN
requests are done with the claim type of CLAIM_DELEGATE_CUR. This requests are done with the claim type of CLAIM_DELEGATE_CUR. This
will allow the presentation of the delegation stateid so that the will allow the presentation of the delegation stateid so that the
client can establish the appropriate rights to perform the OPEN. client can establish the appropriate rights to perform the OPEN.
(see the Section 18.16 which describes the OPEN" operation for (see the Section 18.16 which describes the OPEN" operation for
details.) details.)
o If there are granted file locks, the corresponding LOCK operations o If there are granted file locks, the corresponding LOCK operations
need to be performed. This applies to the write open delegation need to be performed. This applies to the write open delegation
case only. case only.
o For a write open delegation, if at the time of recall the file is o For a write open delegation, if at the time of recall the file is
not open for write, all modified data for the file must be flushed not open for write, all modified data for the file must be flushed
to the server. If the delegation had not existed, the client to the server. If the delegation had not existed, the client
would have done this data flush before the CLOSE operation. would have done this data flush before the CLOSE operation.
o For a write open delegation when a file is still open at the time o For a write open delegation when a file is still open at the time
skipping to change at page 197, line 10 skipping to change at page 199, line 32
operation change attribute values atomically. When the server is operation change attribute values atomically. When the server is
unable to report the before and after values atomically with respect unable to report the before and after values atomically with respect
to the directory operation, the server must indicate that fact in the to the directory operation, the server must indicate that fact in the
change_info4 return value. When the information is not atomically change_info4 return value. When the information is not atomically
reported, the client should not assume that other clients have not reported, the client should not assume that other clients have not
changed the directory. changed the directory.
11. Multi-Server Namespace 11. Multi-Server Namespace
NFSv4.1 supports attributes that allow a namespace to extend beyond NFSv4.1 supports attributes that allow a namespace to extend beyond
the boundaries of a single server. Use of such multi-server the boundaries of a single server. It is recommended that clients
namespaces is optional, and for many purposes, single-server and servers support construction of such multi-server namespaces.
namespace are perfectly acceptable. Use of multi-server namespaces Use of such multi-server namespaces is OPTIONAL however, and for many
can provide many advantages, however, by separating a file system's purposes, single-server namespace are perfectly acceptable. Use of
logical position in a namespace from the (possibly changing) multi-server namespaces can provide many advantages, however, by
logistical and administrative considerations that result in separating a file system's logical position in a namespace from the
particular file systems being located on particular servers. (possibly changing) logistical and administrative considerations that
result in particular file systems being located on particular
servers.
11.1. Location attributes 11.1. Location Attributes
NFSv4 contains recommended attributes that allow file systems on one NFSv4 contains recommended attributes that allow file systems on one
server to be associated with one or more instances of that file server to be associated with one or more instances of that file
system on other servers. These attributes specify such file systems system on other servers. These attributes specify such file systems
by specifying a server name (either a DNS name or an IP address) by specifying a server name (either a DNS name or an IP address)
together with the path of that file system within that server's together with the path of that file system within that server's
single-server namespace. single-server namespace.
The fs_locations_info recommended attribute allows specification of The fs_locations_info RECOMMENDED attribute allows specification of
one more file systems instance locations where the data corresponding one or more filesystem instance locations where the data
to a given file system may be found. This attribute provides to the corresponding to a given file system may be found. This attribute
client, in addition to information about file system instance provides to the client, in addition to information about file system
locations, extensive information about the various file system instance locations, significant information about the various file
instance choices (e.g. priority for use, writability, currency, etc.) system instance choices (e.g. priority for use, writability,
as well as information to help the client efficiently effect as currency, etc.). It also includes information to help the client
seamless a transition as possible among multiple file system efficiently effect as seamless a transition as possible among
instances, when and if that should be necessary. multiple file system instances, when and if that should be necessary.
The fs_locations recommended attribute is inherited from NFSv4.0 and The fs_locations RECOMMENDED attribute is inherited from NFSv4.0 and
only allows specification of the file system locations where the data only allows specification of the file system locations where the data
corresponding to a given file system may be found. Servers should corresponding to a given file system may be found. Servers SHOULD
make this attribute available whenever fs_locations_info is make this attribute available whenever fs_locations_info is
supported, but client use of fs_locations_info is to be preferred. supported, but client use of fs_locations_info is to be preferred.
11.2. File System Presence or Absence 11.2. File System Presence or Absence
A given location in an NFSv4 namespace (typically but not necessarily A given location in an NFSv4 namespace (typically but not necessarily
a multi-server namespace) can have a number of file system instance a multi-server namespace) can have a number of file system instance
locations associated with it (via the fs_locations or locations associated with it (via the fs_locations or
fs_locations_info attribute). There may also be an actual current fs_locations_info attribute). There may also be an actual current
file system at that location, accessible via normal namespace file system at that location, accessible via normal namespace
operations (e.g. LOOKUP). In this case, the file system is said to operations (e.g. LOOKUP). In this case, the file system is said to
be "present" at that position in the namespace and clients will be "present" at that position in the namespace and clients will
typically use it, reserving use of additional locations specified via typically use it, reserving use of additional locations specified via
the location-related attributes to situations in which the principal the location-related attributes to situations in which the principal
location is no longer available. location is no longer available.
When there is no actual file system at the namespace location in When there is no actual file system at the namespace location in
question, the file system is said to be "absent". An absent file question, the file system is said to be "absent". An absent file
system contains no files or directories other than the root and any system contains no files or directories other than the root. Any
reference to it, except to access a small set of attributes useful in reference to it, except to access a small set of attributes useful in
determining alternate locations, will result in an error, determining alternate locations, will result in an error,
NFS4ERR_MOVED. Note that if the server ever returns NFS4ERR_MOVED NFS4ERR_MOVED. Note that if the server ever returns the error
(i.e. file systems may be absent), it MUST support the fs_locations NFS4ERR_MOVED, it MUST support the fs_locations attribute and SHOULD
attribute and SHOULD support the fs_locations_info and fs_absent support the fs_locations_info and fs_status attributes.
attributes.
While the error name suggests that we have a case of a file system While the error name suggests that we have a case of a file system
which once was present, and has only become absent later, this is which once was present, and has only become absent later, this is
only one possibility. A position in the namespace may be permanently only one possibility. A position in the namespace may be permanently
absent with the file system(s) designated by the location attributes absent with the set of file system(s) designated by the location
the only realization. The name NFS4ERR_MOVED reflects an earlier, attributes being the only realization. The name NFS4ERR_MOVED
more limited conception of its function, but this error will be reflects an earlier, more limited conception of its function, but
returned whenever the referenced file system is absent, whether it this error will be returned whenever the referenced file system is
has moved or not. absent, whether it has moved or not.
Except in the case of GETATTR-type operations (to be discussed Except in the case of GETATTR-type operations (to be discussed
later), when the current filehandle at the start of an operation is later), when the current filehandle at the start of an operation is
within an absent file system, that operation is not performed and the within an absent file system, that operation is not performed and the
error NFS4ERR_MOVED returned, to indicate that the file system is error NFS4ERR_MOVED returned, to indicate that the file system is
absent on the current server. absent on the current server.
Because a GETFH cannot succeed if the current filehandle is within an Because a GETFH cannot succeed if the current filehandle is within an
absent file system, filehandles within an absent file system cannot absent file system, filehandles within an absent file system cannot
be transferred to the client. When a client does have filehandles be transferred to the client. When a client does have filehandles
within an absent file system, it is the result of obtaining them when within an absent file system, it is the result of obtaining them when
the file system was present, and having the file system become absent the file system was present, and having the file system become absent
subsequently. subsequently.
It should be noted that because the check for the current filehandle It should be noted that because the check for the current filehandle
being within an absent file system happens at the start of every being within an absent file system happens at the start of every
operation, operations which change the current filehandle so that it operation, operations that change the current filehandle so that it
is within an absent file system will not result in an error. This is within an absent file system will not result in an error. This
allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be
used to get attribute information, particularly location attribute used to get attribute information, particularly location attribute
information, as discussed below. information, as discussed below.
The recommended file system attribute fs_absent can used to The recommended file system attribute fs_status can be used to
interrogate the present/absent status of a given file system. interrogate the present/absent status of a given file system.
11.3. Getting Attributes for an Absent File System 11.3. Getting Attributes for an Absent File System
When a file system is absent, most attributes are not available, but When a file system is absent, most attributes are not available, but
it is necessary to allow the client access to the small set of it is necessary to allow the client access to the small set of
attributes that are available, and most particularly those that give attributes that are available, and most particularly those that give
information about the correct current locations for this file system, information about the correct current locations for this file system,
fs_locations and fs_locations_info. fs_locations and fs_locations_info.
11.3.1. GETATTR Within an Absent File System 11.3.1. GETATTR Within an Absent File System
As mentioned above, an exception is made for GETATTR in that As mentioned above, an exception is made for GETATTR in that
attributes may be obtained for a filehandle within an absent file attributes may be obtained for a filehandle within an absent file
system. This exception only applies if the attribute mask contains system. This exception only applies if the attribute mask contains
at least one attribute bit that indicates the client is interested in at least one attribute bit that indicates the client is interested in
a result regarding an absent file system: fs_locations, a result regarding an absent file system: fs_locations,
fs_locations_info, or fs_absent. If none of these attributes is fs_locations_info, or fs_status. If none of these attributes is
requested, GETATTR will result in an NFS4ERR_MOVED error. requested, GETATTR will result in an NFS4ERR_MOVED error.
When a GETATTR is done on an absent file system, the set of supported When a GETATTR is done on an absent file system, the set of supported
attributes is very limited. Many attributes, including those that attributes is very limited. Many attributes, including those that
are normally mandatory will not be available on an absent file are normally mandatory, will not be available on an absent file
system. In addition to the attributes mentioned above (fs_locations, system. In addition to the attributes mentioned above (fs_locations,
fs_locations_info, fs_absent), the following attributes SHOULD be fs_locations_info, fs_status), the following attributes SHOULD be
available on absent file systems, in the case of recommended available on absent file systems, in the case of recommended
attributes at least to the same degree that they are available on attributes at least to the same degree that they are available on
present file systems. present file systems.
change: This attribute is useful for absent file systems and can be change_policy: This attribute is useful for absent file systems and
helpful in summarizing to the client when any of the location- can be helpful in summarizing to the client when any of the
related attributes changes. location-related attributes changes.
fsid: This attribute should be provided so that the client can fsid: This attribute should be provided so that the client can
determine file system boundaries, including, in particular, the determine file system boundaries, including, in particular, the
boundary between present and absent file systems. boundary between present and absent file systems. This value must
be different from any other fsid on the current server and need
have no particular relationship to fsids on any particular
destination to which the client might be directed.
mounted_on_fileid: For objects at the top of an absent file system mounted_on_fileid: For objects at the top of an absent file system
this attribute needs to be available. Since the fileid is one this attribute needs to be available. Since the fileid is one
which is within the present parent file system, there should be no which is within the present parent file system, there should be no
need to reference the absent file system to provide this need to reference the absent file system to provide this
information. information.
Other attributes SHOULD NOT be made available for absent file Other attributes SHOULD NOT be made available for absent file
systems, even when it is possible to provide them. The server should systems, even when it is possible to provide them. The server should
not assume that more information is always better and should avoid not assume that more information is always better and should avoid
gratuitously providing additional information. gratuitously providing additional information.
When a GETATTR operation includes a bit mask for one of the When a GETATTR operation includes a bit mask for one of the
attributes fs_locations, fs_locations_info, or absent, but where the attributes fs_locations, fs_locations_info, or fs_status, but where
bit mask includes attributes which are not supported, GETATTR will the bit mask includes attributes which are not supported, GETATTR
not return an error, but will return the mask of the actual will not return an error, but will return the mask of the actual
attributes supported with the results. attributes supported with the results.
Handling of VERIFY/NVERIFY is similar to GETATTR in that if the Handling of VERIFY/NVERIFY is similar to GETATTR in that if the
attribute mask does not include fs_locations, fs_locations_info, or attribute mask does not include fs_locations, fs_locations_info, or
fs_absent, the error NFS4ERR_MOVED will result. It differs in that fs_status, the error NFS4ERR_MOVED will result. It differs in that
any appearance in the attribute mask of an attribute not supported any appearance in the attribute mask of an attribute not supported
for an absent file system (and note that this will include some for an absent file system (and note that this will include some
normally mandatory attributes), will also cause an NFS4ERR_MOVED normally mandatory attributes), will also cause an NFS4ERR_MOVED
result. result.
11.3.2. READDIR and Absent File Systems 11.3.2. READDIR and Absent File Systems
A READDIR performed when the current filehandle is within an absent A READDIR performed when the current filehandle is within an absent
file system will result in an NFS4ERR_MOVED error, since, unlike the file system will result in an NFS4ERR_MOVED error, since, unlike the
case of GETATTR, no such exception is made for READDIR. case of GETATTR, no such exception is made for READDIR.
Attributes for an absent file system may be fetched via a READDIR for Attributes for an absent file system may be fetched via a READDIR for
a directory in a present file system, when that directory contains a directory in a present file system, when that directory contains
the root directories of one or more absent file systems. In this the root directories of one or more absent file systems. In this
case, the handling is as follows: case, the handling is as follows:
o If the attribute set requested includes one of the attributes o If the attribute set requested includes one of the attributes
fs_locations, fs_locations_info, or fs_absent, then fetching of fs_locations, fs_locations_info, or fs_status, then fetching of
attributes proceeds normally and no NFS4ERR_MOVED indication is attributes proceeds normally and no NFS4ERR_MOVED indication is
returned, even when the rdattr_error attribute is requested. returned, even when the rdattr_error attribute is requested.
o If the attribute set requested does not include one of the o If the attribute set requested does not include one of the
attributes fs_locations, fs_locations_info, or fs_absent, then if attributes fs_locations, fs_locations_info, or fs_status, then if
the rdattr_error attribute is requested, each directory entry for the rdattr_error attribute is requested, each directory entry for
the root of an absent file system, will report NFS4ERR_MOVED as the root of an absent file system, will report NFS4ERR_MOVED as
the value of the rdattr_error attribute. the value of the rdattr_error attribute.
o If the attribute set requested does not include any of the o If the attribute set requested does not include any of the
attributes fs_locations, fs_locations_info, fs_absent, or attributes fs_locations, fs_locations_info, fs_status, or
rdattr_error then the occurrence of the root of an absent file rdattr_error then the occurrence of the root of an absent file
system within the directory will result in the READDIR failing system within the directory will result in the READDIR failing
with an NFSERR_MOVED error. with an NFS4ERR_MOVED error.
o The unavailability of an attribute because of a file system's o The unavailability of an attribute because of a file system's
absence, even one that is ordinarily mandatory, does not result in absence, even one that is ordinarily mandatory, does not result in
any error indication. The set of attributes returned for the root any error indication. The set of attributes returned for the root
directory of the absent file system in that case is simply directory of the absent file system in that case is simply
restricted to those actually available. restricted to those actually available.
11.4. Uses of Location Information 11.4. Uses of Location Information
The location-bearing attributes (fs_locations and fs_locations_info), The location-bearing attributes (fs_locations and fs_locations_info),
provide, together with the possibility of absent file systems, a provide, together with the possibility of absent file systems, a
number of important facilities in providing reliable, manageable, and number of important facilities in providing reliable, manageable, and
scalable data access. scalable data access.
When a file system is present, these attribute can provide When a file system is present, these attributes can provide
alternative locations, to be used to access the same data, in the alternative locations, to be used to access the same data, in the
event that server failures, communications problems, or other event of server failures, communications problems, or other
difficulties, make continued access to the current file system difficulties that make continued access to the current file system
impossible or otherwise impractical. Under some circumstances impossible or otherwise impractical. Under some circumstances
multiple alternative locations may be used simultaneously to provide multiple alternative locations may be used simultaneously to provide
higher performance access to the file system in question. Provision higher performance access to the file system in question. Provision
of such alternate locations is referred to as "replication" although of such alternate locations is referred to as "replication" although
there are cases in which replicated sets of data are not in fact there are cases in which replicated sets of data are not in fact
present, and the replicas are instead different paths to the same present, and the replicas are instead different paths to the same
data. data.
When a file system is present and becomes absent, clients can be When a file system is present and becomes absent, clients can be
given the opportunity to have continued access to their data, at an given the opportunity to have continued access to their data, at an
alternate location. In this case, a continued attempt to use the alternate location. In this case, a continued attempt to use the
data in the now-absent file system will result in an NFSERR_MOVED data in the now-absent file system will result in an NFS4ERR_MOVED
error and at that point the successor locations (typically only one error and at that point the successor locations (typically only one
but multiple choices are possible) can be fetched and used to but multiple choices are possible) can be fetched and used to
continue access. Transfer of the file system contents to the new continue access. Transfer of the file system contents to the new
location is referred to as "migration", but it should be kept in mind location is referred to as "migration", but it should be kept in mind
that there are cases in which this term can be used, like that there are cases in which this term can be used, like
"replication", when there is no actual data migration per se. "replication", when there is no actual data migration per se.
Where a file system was not previously present, specification of file Where a file system was not previously present, specification of file
system location provides a means by which file systems located on one system location provides a means by which file systems located on one
server can be associated with a namespace defined by another server, server can be associated with a namespace defined by another server,
thus allowing a general multi-server namespace facility. Designation thus allowing a general multi-server namespace facility. Designation
of such a location, in place of an absent file system, is called of such a location, in place of an absent file system, is called
"referral". "referral".
Because client support for location-related attributes is OPTIONAL, a
server may (but is not required to) take action to hide migration and
referral events from such clients, by acting as a proxy, for example.
The server can determine the presence client support from data passed
in the EXCHANGE_ID operation (See Section 18.35.4).
11.4.1. File System Replication 11.4.1. File System Replication
The fs_locations and fs_locations_info attributes provide alternative The fs_locations and fs_locations_info attributes provide alternative
locations, to be used to access data in place of or in a addition to locations, to be used to access data in place of or in addition to
the current file system instance. On first access to a file system, the current file system instance. On first access to a file system,
the client should obtain the value of the set alternate locations by the client should obtain the value of the set of alternate locations
interrogating the fs_locations or fs_locations_info attribute, with by interrogating the fs_locations or fs_locations_info attribute,
the latter being preferred. with the latter being preferred.
In the event that server failures, communications problems, or other In the event that server failures, communications problems, or other
difficulties, make continued access to the current file system difficulties make continued access to the current file system
impossible or otherwise impractical, the client can use the alternate impossible or otherwise impractical, the client can use the alternate
locations as a way to get continued access to his data. Depending on locations as a way to get continued access to his data. Depending on
specific attributes of these alternate locations, as indicated within specific attributes of these alternate locations, as indicated within
the fs_locations_info attribute, multiple locations may be used the fs_locations_info attribute, multiple locations may be used
simultaneously, to provide higher performance through the simultaneously, to provide higher performance through the
exploitation of multiple paths between client and target file system. exploitation of multiple paths between client and target file system.
The alternate locations may be physical replicas of the (typically The alternate locations may be physical replicas of the (typically
read-only) file system data, or they may reflect alternate paths to read-only) file system data, or they may reflect alternate paths to
the same server or provide for the use of various form of server the same server or provide for the use of various forms of server
clustering in which multiple servers provide alternate ways of clustering in which multiple servers provide alternate ways of
accessing the same physical file system. How these different modes accessing the same physical file system. How these different modes
of file system transition are represented within the fs_locations and of file system transition are represented within the fs_locations and
fs_locations_info attributes and how the client deals with file fs_locations_info attributes and how the client deals with file
system transition issues will be discussed in detail below. system transition issues will be discussed in detail below.
When multiple server addresses correspond to the same actual server, Multiple server addresses may correspond to the same actual server,
as shown by a common so_major_id field within the eir_server_owner as shown by a common so_major_id field within the eir_server_owner
field returned by EXCHANGE_ID, the client may assume that for each field returned by EXCHANGE_ID (see Section 18.35.4). When such
file system in the namespace of a given server network address, there server addresses exist, the client may assume that for each file
system in the namespace of a given server network address, there
exist file systems at corresponding namespace locations for each of exist file systems at corresponding namespace locations for each of
the other server network addresses, even in the absence of explicit the other server network addresses. It may do this even in the
listing in fs_locations and fs_locations_info. Such corresponding absence of explicit listing in fs_locations and fs_locations_info.
file system locations can be used as alternate locations, just as Such corresponding file system locations can be used as alternate
those explicitly specified via the fs_locations and fs_locations_info locations, just as those explicitly specified via the fs_locations
attributes. Where these specific locations are designated in the and fs_locations_info attributes. Where these specific locations are
fs_locations_info attribute, the conditions of use specified in this designated in the fs_locations_info attribute, the conditions of use
attribute (e.g. priorities, specification of simultaneous use) may specified in this attribute (e.g. priorities, specification of
limit the clients use of these alternate locations. simultaneous use) may limit the client's use of these alternate
locations.
When multiple replicas exist and are used simultaneously or in
succession by a client, they must designate the same data (with
metadata being the same to the degree indicated by the
fs_locations_info attribute). Where file systems are writable, a
change made on one instance must be visible on all instances,
immediately upon the earlier of the return of the modifying request
or the visibility of that change on any of the associated replicas.
Where a file system is not writable but represents a read-only copy
(possibly periodically updated) of a writable file system, similar
requirements apply to the propagation of updates. It must be
guaranteed that any change visible on the original file system
instance must be immediately visible on any replica before the client
transitions access to that replica, to avoid any possibility, that a
client in effecting a transition to a replica, will see any reversion
in file system state. The specific means by which this will be
prevented varies based on fs4_status_type reported as part of the
fs_status attribute. (See Section 11.11).
11.4.2. File System Migration 11.4.2. File System Migration
When a file system is present and becomes absent, clients can be When a file system is present and becomes absent, clients can be
given the opportunity to have continued access to their data, at an given the opportunity to have continued access to their data, at an
alternate location, as specified by the fs_locations or alternate location, as specified by the fs_locations or
fs_locations_info attribute. Typically, a client will be accessing fs_locations_info attribute. Typically, a client will be accessing
the file system in question, get an NFS4ERR_MOVED error, and then use the file system in question, get an NFS4ERR_MOVED error, and then use
the fs_locations or fs_locations_info attribute to determine the new the fs_locations or fs_locations_info attribute to determine the new
location of the data. When fs_locations_info is used, additional location of the data. When fs_locations_info is used, additional
skipping to change at page 203, line 34 skipping to change at page 205, line 46
The new location may be an alternate communication path to the same The new location may be an alternate communication path to the same
server, or, in the case of various forms of server clustering, server, or, in the case of various forms of server clustering,
another server providing access to the same physical file system. another server providing access to the same physical file system.
The client's responsibilities in dealing with this transition depend The client's responsibilities in dealing with this transition depend
on the specific nature of the new access path and how and whether on the specific nature of the new access path and how and whether
data was in fact migrated. These issues will be discussed in detail data was in fact migrated. These issues will be discussed in detail
below. below.
When multiple server addresses correspond to the same actual server, When multiple server addresses correspond to the same actual server,
as shown by a common value for so_major_id field of the as shown by a common value for the so_major_id field of the
eir_server_owner field returned by EXCHANGE_ID, the location or eir_server_owner field returned by EXCHANGE_ID, the location or
locations may designate alternate server addresses in the form of locations may designate alternate server addresses in the form of
specific server network addresses, when the file system in question specific server network addresses. These could be used to access the
is available at those addresses, and no longer accessible at the file system in question at those addresses and when it is no longer
original address. accessible at the original address.
Although a single successor location is typical, multiple locations Although a single successor location is typical, multiple locations
may be provided, together with information that allows priority among may be provided, together with information that allows priority among
the choices to be indicated, via information in the fs_locations_info the choices to be indicated, via information in the fs_locations_info
attribute. Where suitable clustering mechanisms make it possible to attribute. Where suitable clustering mechanisms make it possible to
provide multiple identical file systems or paths to them, this allows provide multiple identical file systems or paths to them, this allows
the client the opportunity to deal with any resource or the client the opportunity to deal with any resource or
communications issues that might limit data availability. communications issues that might limit data availability.
When an alternate location is designated as the target for migration, When an alternate location is designated as the target for migration,
skipping to change at page 204, line 14 skipping to change at page 206, line 27
be visible on all migration targets. Where a file system is not be visible on all migration targets. Where a file system is not
writable but represents a read-only copy (possibly periodically writable but represents a read-only copy (possibly periodically
updated) of a writable file system, similar requirements apply to the updated) of a writable file system, similar requirements apply to the
propagation of updates. Any change visible in the original file propagation of updates. Any change visible in the original file
system must already be effected on all migration targets, to avoid system must already be effected on all migration targets, to avoid
any possibility, that a client in effecting a transition to the any possibility, that a client in effecting a transition to the
migration target will see any reversion in file system state. migration target will see any reversion in file system state.
11.4.3. Referrals 11.4.3. Referrals
Referrals provide a way of placing a file system in a location Referrals provide a way of placing a file system in a location within
essentially without respect to its physical location on a given the namespace essentially without respect to its physical location on
server. This allows a single server of a set of servers to present a a given server. This allows a single server or a set of servers to
multi-server namespace that encompasses file systems located on present a multi-server namespace that encompasses file systems
multiple servers. Some likely uses of this include establishment of located on multiple servers. Some likely uses of this include
site-wide or organization-wide namespaces, or even knitting such establishment of site-wide or organization-wide namespaces, or even
together into a truly global namespace. knitting such together into a truly global namespace.
Referrals occur when a client determines, upon first referencing a Referrals occur when a client determines, upon first referencing a
position in the current namespace, that it is part of a new file position in the current namespace, that it is part of a new file
system and that that file system is absent. When this occurs, system and that that file system is absent. When this occurs,
typically by receiving the error NFS4ERR_MOVED, the actual location typically by receiving the error NFS4ERR_MOVED, the actual location
or locations of the file system can be determined by fetching the or locations of the file system can be determined by fetching the
fs_locations or fs_locations_info attribute. fs_locations or fs_locations_info attribute.
The locations-related attribute may designate a single file system The locations-related attribute may designate a single file system
location or multiple file system locations, to be selected based on location or multiple file system locations, to be selected based on
the needs of the client. The server, in the fs_locations_info the needs of the client. The server, in the fs_locations_info
attribute may specify priorities to be associated with various file attribute may specify priorities to be associated with various file
system location choices. The server may assign different priorities system location choices. The server may assign different priorities
to different locations as reported to individual clients, in order to to different locations as reported to individual clients, in order to
adapt to client physical location or to effect load balancing. When adapt to client physical location or to effect load balancing. When
both read-only and read-write file systems are present, some of the both read-only and read-write file systems are present, some of the
read-only locations may not absolutely up-to-date (as they would have read-only locations may not be absolutely up-to-date (as they would
to be in the case of replication and migration). Servers may also have to be in the case of replication and migration). Servers may
specify file system locations that include client-substituted also specify file system locations that include client-substituted
variable so that different clients are referred to different file variables so that different clients are referred to different file
systems (with different data contents) based on client attributes systems (with different data contents) based on client attributes
such as cpu architecture. such as CPU architecture.
When the fs_locations_info attribute indicates that there are
multiple possible targets listed, the relationships among them may be
important to the client in selecting the one to use. The same rules
specified in Section 11.4.1 defining the appropriate standards for
the data propagation, apply to these multiple replicas as well. For
example, the client might prefer a writable that has additional
writable replicas to which it subsequently might switch. Note that,
as distinguished from the case of replication, there is no need to
deal with the case of propagation of updates made by the current
client, since the current client has not accessed the filesystem in
question.
Use of multi-server namespaces is enabled by NFSv4 but is not Use of multi-server namespaces is enabled by NFSv4 but is not
required. The use of multi-server namespaces and their scope will required. The use of multi-server namespaces and their scope will
depend on the applications used, and system administration depend on the applications used, and system administration
preferences. preferences.
Multi-server namespaces can be established by a single server Multi-server namespaces can be established by a single server
providing a large set of referrals to all of the included file providing a large set of referrals to all of the included file
systems. Alternatively, a single multi-server namespace may be systems. Alternatively, a single multi-server namespace may be
administratively segmented with separate referral file systems (on administratively segmented with separate referral file systems (on
separate servers) for each separately-administered section of the separate servers) for each separately-administered section of the
namespace. Any segment or the top-level referral file system may use namespace. Any segment or the top-level referral file system may use
replicated referral file systems for higher availability. replicated referral file systems for higher availability.
Generally, multi-server namespaces are for the most part uniform, in Generally, multi-server namespaces are for the most part uniform, in
that the same data made available to one client at a given location that the same data made available to one client at a given location
in the namespace is made availably to all clients at that location. in the namespace is made available to all clients at that location.
There are however facilities provided which allow different client to There are however facilities provided which allow different clients
be directed to different sets of data, so as to adapt to such client to be directed to different sets of data, so as to adapt to such
characteristics as cpu architecture. client characteristics as CPU architecture.
11.5. Additional Client-side Considerations 11.5. Additional Client-side Considerations
When clients make use of servers that implement referrals, When clients make use of servers that implement referrals,
replication, and migration, care should be taken so that a user who replication, and migration, care should be taken so that a user who
mounts a given file system that includes a referral or a relocated mounts a given file system that includes a referral or a relocated
file system continue to see a coherent picture of that user-side file file system continues to see a coherent picture of that user-side
system despite the fact that it contains a number of server-side file file system despite the fact that it contains a number of server-side
systems which may be on different servers. file systems which may be on different servers.
One important issue is upward navigation from the root of a server- One important issue is upward navigation from the root of a server-
side file system to its parent (specified as ".." in UNIX). The side file system to its parent (specified as ".." in UNIX), in the
client needs to determine when it hits an fsid root going up the file case in which it transitions to that filesystem as a result of
tree. When at such a point, and needs to ascend to the parent, it referral, migration, or a transition as a result of replication.
must do so locally instead of sending a LOOKUPP call to the server. When at such a point, and it needs to ascend to the parent, it must
The LOOKUPP would normally return the ancestor of the target file go back to the parent as seen within the multi-server namespace
system on the target server, which may not be part of the space that rather issuing a LOOKUPP call to the server, which would result in
the client mounted. the parent within that server's single-server namespace. In order to
do this, the client needs to remember the filehandles that represent
A related issue is upward navigation from named attribute such filesystem roots, and use these instead of issuing a LOOKUPP to
directories. The named attribute directories are essentially the current server. This will allow the client to present to
detached from the namespace and this property should be safely applications a consistent namespace, where upward navigation and
represented in the client operating environment. LOOKUPP on a named downward navigation are consistent.
attribute directory may return the filehandle of the associated file
and conveying this to applications might be unsafe as many
applications expect the parent of a directory to be a directory by
itself. Therefore the client may want to hide the parent of named
attribute directories (represented as ".." in UNIX) or represent the
named attribute directory as its own parent (as typically done for
the file system root directory in UNIX)
Another issue concerns refresh of referral locations. When referrals Another issue concerns refresh of referral locations. When referrals
are used extensively, they may change as server configurations are used extensively, they may change as server configurations
change. It is expected that clients will cache information related change. It is expected that clients will cache information related
to traversing referrals so that future client side requests are to traversing referrals so that future client side requests are
resolved locally without server communication. This is usually resolved locally without server communication. This is usually
rooted in client-side name lookup caching. Clients should rooted in client-side name lookup caching. Clients should
periodically purge this data for referral points in order to detect periodically purge this data for referral points in order to detect
changes in location information. When the change attribute changes changes in location information. When the change_policy attribute
for directories that hold referral entries or for the referral changes for directories that hold referral entries or for the
entries themselves, clients should consider any associated cached referral entries themselves, clients should consider any associated
referral information to be out of date. cached referral information to be out of date.
11.6. Effecting File System Transitions 11.6. Effecting File System Transitions
Transitions between file system instances, whether due to switching Transitions between file system instances, whether due to switching
between replicas upon server unavailability, or in response to a between replicas upon server unavailability, or in response to
server-initiated migration events are best dealt with together. Even server-initiated migration events are best dealt with together. This
though the prototypical use cases of replication and migration is so even though for the server pragmatic considerations will
contain distinctive sets of features, when all possibilities for normally force different implementation strategies for planned and
these operations are considered, the underlying unity of these unplanned transitions. Even though the prototypical use cases of
operations, from the client's point of view is clear, even though for replication and migration contain distinctive sets of features, when
the server pragmatic considerations will normally force different all possibilities for these operations are considered, there is an
implementation strategies for planned and unplanned transitions. underlying unity of these operations, from the client's point of
view, that makes treating them together desirable.
A number of methods are possible for servers to replicate data and to A number of methods are possible for servers to replicate data and to
track client state in order to allow clients to transition between track client state in order to allow clients to transition between
file system instances with a minimum of disruption. Such methods file system instances with a minimum of disruption. Such methods
vary between those that use inter-server clustering techniques to vary between those that use inter-server clustering techniques to
limit the changes seen by the client, to those that are less limit the changes seen by the client, to those that are less
aggressive, use more standard methods of replicating data, and impose aggressive, use more standard methods of replicating data, and impose
a greater burden on the client to adapt to the transition. a greater burden on the client to adapt to the transition.
The NFSv4.1 protocol does not impose choices on clients and servers The NFSv4.1 protocol does not impose choices on clients and servers
skipping to change at page 207, line 9 skipping to change at page 209, line 27
types. Two file systems that belong to such a class share some types. Two file systems that belong to such a class share some
important aspect of file system behavior that clients may depend upon important aspect of file system behavior that clients may depend upon
when present, to easily effect a seamless transition between file when present, to easily effect a seamless transition between file
system instances. Conversely, where the file systems do not belong system instances. Conversely, where the file systems do not belong
to such a common class, the client has to deal with various sorts of to such a common class, the client has to deal with various sorts of
implementation discontinuities which may cause performance or other implementation discontinuities which may cause performance or other
issues in effecting a transition. issues in effecting a transition.
Where the fs_locations_info attribute is available, such file system Where the fs_locations_info attribute is available, such file system
classification data will be made directly available to the client. classification data will be made directly available to the client.
See Section 11.10 for details. When only fs_locations is available, See Section 11.9 for details. When only fs_locations is available,
default assumptions with regard to such classifications have to be default assumptions with regard to such classifications have to be
inferred. See Section 11.9 for details. inferred. See Section 11.8 for details.
In cases in which one server is expected to accept opaque values from In cases in which one server is expected to accept opaque values from
the client that originated from another server, it is a wise the client that originated from another server, the servers SHOULD
implementation practice for the servers to encode the "opaque" values encode the "opaque" values in big endian octet order. If this is
in big endian octet order. If this is done, servers acting as done, servers acting as replicas or immigrating file systems will be
replicas or immigrating file systems will be able to parse values able to parse values like stateids, directory cookies, filehandles,
like stateids, directory cookies, filehandles, etc. even if their etc. even if their native octet order is different from that of other
native octet order is different from that of other servers servers cooperating in the replication and migration of the file
cooperating in the replication and migration of the file system. system.
11.6.1. File System Transitions and Simultaneous Access 11.6.1. File System Transitions and Simultaneous Access
When a single file system may be accessed at multiple locations, When a single file system may be accessed at multiple locations,
whether this is because of an indication of file system identity as whether this is because of an indication of file system identity as
reported by the fs_locations or fs_locations_info attributes or reported by the fs_locations or fs_locations_info attributes or
because two file systems instances have corresponding locations on because two file systems instances have corresponding locations on
server addresses which connect to the same server as indicated by a server addresses which connect to the same server as indicated by a
common so_major_id field in the eir_server_owner field returned by common so_major_id field in the eir_server_owner field returned by
EXCHANGE_ID, the client will, depending on specific circumstances as EXCHANGE_ID, the client will, depending on specific circumstances as
discussed below, either: discussed below, either:
o Access multiple instances simultaneously, as representing o The client accesses multiple instances simultaneously, as
alternate paths to the same data and metadata. representing alternate paths to the same data and metadata.
o The client accesses one instance (or set of instances) and then o The client accesses one instance (or set of instances) and then
transitions to an alternative instance (or set of instances) as a transitions to an alternative instance (or set of instances) as a
result of network issues, server unresponsiveness, or server- result of network issues, server unresponsiveness, or server-
directed migration. The transition may involve changes in directed migration. The transition may involve changes in
filehandles, fileids, the change attribute, and or locking state, filehandles, fileids, the change attribute, and/or locking state,
depending on the attributes of the source and destination file depending on the attributes of the source and destination file
system instances, as specified in the fs_locations_info attribute. system instances, as specified in the fs_locations_info attribute.
Which of these choices is possible, and how a transition is effected Which of these choices is possible, and how a transition is effected,
is governed by equivalence classes of file system instances as is governed by equivalence classes of file system instances as
reported by the fs_locations_info attribute, and, for file systems reported by the fs_locations_info attribute, and, for file systems
instances in the same location within multiple single-server instances in the same location within multiple single-server
namespace, by the so_major_id field in the eir_server_owner field namespace as indicated by the so_major_id field in the
returned by EXCHANGE_ID. eir_server_owner field returned by EXCHANGE_ID.
11.6.2. Simultaneous Use and Transparent Transitions 11.6.2. Simultaneous Use and Transparent Transitions
When two file system instances have the same location within their When two file system instances have the same location within their
respective single-server namespaces and those two server IP addresses respective single-server namespaces and those two server network
return the so_major_id value in the eir_server_owner value returned addresses return the same so_major_id value in the eir_server_owner
in response to EXCHANGE_ID, those file systems instances can be value returned in response to EXCHANGE_ID, those file systems
treated as the same, and either used together simultaneously or instances can be treated as the same, and either used together
serially with no transition activity required on the part of the simultaneously or serially with no transition activity required on
client. the part of the client. In this case we refer to the transition as
"transparent" and the client in transferring access from to the other
is acting as it would in the event that communication is interrupted,
with a new connection and possibly a new session being established to
continue access to the same filesystem.
Whether simultaneous use of the two file system instances is valid is Whether simultaneous use of the two file system instances is valid is
controlled by whether the fs_locations_info attribute shows the two controlled by whether the fs_locations_info attribute shows the two
instances as having the same _simultaneous-use_ class. instances as having the same _simultaneous-use_ class. See
Section 11.9.1 for information about the definition of the various
use classes, including the _simultaneous-use_ class.
Note that for two such file systems, any information within the Note that for two such file systems, any information within the
fs_locations_info attribute that indicates the need for special fs_locations_info attribute that indicates the need for special
transition activity, i.e. the appearance of the two file system transition activity, i.e. the appearance of the two file system
instances with different _handle_, _fileid_, _verifier_, _change_ instances with different _handle_, _fileid_, _write-verifier_,
classes, MUST be ignored by the client. The server SHOULD not _change_, _readdir_ classes, indicates a serious problem and the
indicate that these instances belong to different _handle_, _fileid_, client, if it allows transition to the filesystem instance at all,
_verifier_, _change_ classes, whether the two instances are shown must not treat this as a transparent transition. The server SHOULD
belonging to the same _simultaneous-use_ class or not. NOT indicate that these instances belong to different _handle_,
_fileid_, _write-verifier_, _change_, _readdir_ classes, whether the
two instances are shown belonging to the same _simultaneous-use_
class or not.
Where these conditions do not apply, a non-transparent file system Where these conditions do not apply, a non-transparent file system
instance transition is required with the details depending on the instance transition is required with the details depending on the
respective _handle_, _fileid_, _verifier_, _change_ classes of the respective _handle_, _fileid_, _verifier_, _change_, _readdir_
two file system instances and whether the two servers in question classes of the two file system instances and whether the two servers
have the same eir_server_scope value as reported by EXCHANGE_ID. in question have the same eir_server_scope value as reported by
EXCHANGE_ID.
11.6.2.1. Simultaneous Use of File System Instances 11.6.2.1. Simultaneous Use of File System Instances
When the conditions above hold, in either of the following two cases, When the conditions above hold, in either of the following two cases,
the client may use the two file system instances simultaneously. the client may use the two file system instances simultaneously.
o The fs_locations_info attribute does not contain separate per-IP o The fs_locations_info attribute does not contain separate per-
address entries for file systems instances at the distinct IP network-address entries for file systems instances at the distinct
addresses. This includes the case in which the fs_locations_info network addresses. This includes the case in which the
attribute is unavailable. fs_locations_info attribute is unavailable. In this case, the
fact that the eir_server_owner values share an so_major_id value
and this justifies simultaneous use and there is fs_locations_info
attribute information contradicting that.
o The fs_locations_info attribute indicates that two file system o The fs_locations_info attribute indicates that two file system
instances belong to the same _simultaneous-use_ class. instances belong to the same _simultaneous-use_ class.
In this case, the client may use both file system instances In this case, the client may use both file system instances
simultaneously, as representations of the same file system, whether simultaneously, as representations of the same file system, whether
that happens because the two IP addresses connect to the same that happens because the two network addresses connect to the same
physical server or because different servers connect to clustered physical server or because different servers connect to clustered
file systems and export their data in common. When simultaneous use file systems and export their data in common. When simultaneous use
is in effect, any change made to one file system instance must be is in effect, any change made to one file system instance must be
immediately reflected in the other file system instance(s). Locks immediately reflected in the other file system instance(s). Locks
are treated as part of a common lease, associated with a common are treated as part of a common lease, associated with a common
client ID. Depending on the details of the eir_server_owner returned client ID. Depending on the details of the eir_server_owner returned
by EXCHANGE_ID, the two server instances may be accessed by different by EXCHANGE_ID, the two server instances may be accessed by different
sessions or a single session in common. sessions or a single session in common.
11.6.2.2. Transparent File System Transitions 11.6.2.2. Transparent File System Transitions
When the conditions above hold and the fs_locations_info attribute When the conditions above hold and the fs_locations_info attribute
explicitly shows the file system instances for these distinct IP explicitly shows the file system instances for these distinct network
addresses as belonging to different _simultaneous-use_ classes, the addresses as belonging to different _simultaneous-use_ classes, the
file system instances should not be used by the client file system instances should not be used by the client
simultaneously, but rather serially with one being used unless and simultaneously, but rather serially with one being used unless and
until communication difficulties, lack of responsiveness, or an until communication difficulties, lack of responsiveness, or an
explicit migration event causes another file system instance (or set explicit migration event causes another file system instance (or set
of file system instances sharing a common _simultaneous-use_ class to of file system instances sharing a common _simultaneous-use_ class)
be used. to be used.
When a change in file system instance is to be done, the client will When a change of file system instance is to be done, the client will
use the same client ID already in effect. If it already has use the same client ID already in effect. If it already has
connections to the new server address, these will be used. Otherwise connections to the new server address, these will be used. Otherwise
new connections to existing sessions or new sessions associated with new connections to existing sessions or new sessions associated with
the existing client ID are established as indicated by the the existing client ID are established as indicated by the
eir_server_owner returned by EXCHANGE_ID. eir_server_owner returned by EXCHANGE_ID.
In all such transparent transition cases, the following apply: In all such transparent transition cases, the following apply:
o File handles stay the same if persistent and if volatile are only o File handles stay the same if persistent and if volatile are only
subject to expiration, if they would be in the absence of file subject to expiration, if they would be in the absence of file
skipping to change at page 209, line 44 skipping to change at page 212, line 27
o Fileid values do not change across the transition. o Fileid values do not change across the transition.
o The file system will have the same fsid in both the old and new o The file system will have the same fsid in both the old and new
locations. locations.
o Change attribute values are consistent across the transition and o Change attribute values are consistent across the transition and
do not have to be refetched. When change attributes indicate that do not have to be refetched. When change attributes indicate that
a cached object is still valid, it can remain cached. a cached object is still valid, it can remain cached.
o Client, and state identifier retain their validity across the o Client and state identifiers retain their validity across the
transition, except where their staleness is recognized and transition, except where their staleness is recognized and
reported by the new server. Except where such staleness requires reported by the new server. Except where such staleness requires
it, no lock reclamation is needed. it, no lock reclamation is needed. Any such staleness is an
indication that the server should be considered to have rebooted
and is reported as discussed in Section 8.4.2.
o Write verifiers are presumed to retain their validity and can be o Write verifiers are presumed to retain their validity and can be
presented to COMMIT, with the expectation that if COMMIT on the used to compare with verifiers returned by COMMIT on the new
new server accept them as valid, then that server has all of the server, with the expectation that if COMMIT on the new server
returns an identical verifier, then that server has all of the
data unstably written to the original server and has committed it data unstably written to the original server and has committed it
to stable storage as requested. to stable storage as requested.
o Readdir cookies are presumed to retain their validity and can be
presented to subsequent READDIR requests together with the readdir
verifier with which they are associated. When the verifier is
accepted as valid, the cookie will continue the READDIR operation
so that the entire directory can be obtained by the client.
11.6.3. Filehandles and File System Transitions 11.6.3. Filehandles and File System Transitions
There are a number of ways in which filehandles can be handled across There are a number of ways in which filehandles can be handled across
a file system transition. These can be divided into two broad a file system transition. These can be divided into two broad
classes depending upon whether the two file systems across which the classes depending upon whether the two file systems across which the
transition happens share sufficient state to effect some sort of transition happens share sufficient state to effect some sort of
continuity of file system handling. continuity of file system handling.
When there is no such co-operation in filehandle assignment, the two When there is no such co-operation in filehandle assignment, the two
file systems are reported as being in different _handle_ classes. In file systems are reported as being in different _handle_ classes. In
this case, all filehandles are assumed to expire as part of the file this case, all filehandles are assumed to expire as part of the file
system transition. Note that this behavior does not depend on system transition. Note that this behavior does not depend on
fh_expire_type attribute and supersedes the specification of fh_expire_type attribute and supersedes the specification of
FH4_VOL_MIGRATION bit, which only affects behavior when FH4_VOL_MIGRATION bit, which only affects behavior when
fs_locations_info is not available. fs_locations_info is not available.
When there is co-operation in filehandle assignment, the two file When there is co-operation in filehandle assignment, the two file
systems are reported as being in the same _handle_ classes. In this systems are reported as being in the same _handle_ classes. In this
case, persistent filehandle remain valid after the file system case, persistent filehandles remain valid after the file system
transition, while volatile filehandles (excluding those while are transition, while volatile filehandles (excluding those while are
only volatile due to the FH4_VOL_MIGRATION bit) are subject to only volatile due to the FH4_VOL_MIGRATION bit) are subject to
expiration on the target server. expiration on the target server.
11.6.4. Fileid's and File System Transitions 11.6.4. Fileids and File System Transitions
In NFSv4.0, the issue of continuity of fileid's in the event of a In NFSv4.0, the issue of continuity of fileids in the event of a file
file system transition was not addressed. The general expectation system transition was not addressed. The general expectation had
had been that in situations in which the two file system instances been that in situations in which the two file system instances are
are created by a single vendor using some sort of file system image created by a single vendor using some sort of file system image copy,
copy, fileid's will be consistent across the transition while in the fileids will be consistent across the transition while in the
analogous multi-vendor transitions they will not. This poses analogous multi-vendor transitions they will not. This poses
difficulties, especially for the client without special knowledge of difficulties, especially for the client without special knowledge of
the of the transition mechanisms adopted by the server. the transition mechanisms adopted by the server. Note that although
fileid is not a mandatory attributes, many servers provided them and
many clients provide API's that depend on them.
It is important to note that while clients themselves may have no It is important to note that while clients themselves may have no
trouble with a fileid changing as a result of a file system trouble with a fileid changing as a result of a file system
transition event, applications do typically have access to the fileid transition event, applications do typically have access to the fileid
(e.g. via stat), and the result of this is that an application may (e.g. via stat), and the result of this is that an application may
work perfectly well if there is no file system instance transition or work perfectly well if there is no file system instance transition or
if any such transition is among instances created by a single vendor, if any such transition is among instances created by a single vendor,
yet be unable to deal with the situation in which a multi-vendor yet be unable to deal with the situation in which a multi-vendor
transition occurs, at the wrong time. transition occurs, at the wrong time.
Providing the same fileid's in a multi-vendor (multiple server Providing the same fileids in a multi-vendor (multiple server
vendors) environment has generally been held to be quite difficult. vendors) environment has generally been held to be quite difficult.
While there is work to be done, it needs to be pointed out that this While there is work to be done, it needs to be pointed out that this
difficulty is partly self-imposed. Servers have typically identified difficulty is partly self-imposed. Servers have typically identified
fileid with inode number, i.e. with a quantity used to find the file fileid with inode number, i.e. with a quantity used to find the file
in question. This identification poses special difficulties for in question. This identification poses special difficulties for
migration of an fs between vendors where assigning the same index to migration of a filesystem between vendors where assigning the same
a given file may not be possible. Note here that a fileid does not index to a given file may not be possible. Note here that a fileid
require that it be useful to find the file in question, only that it is not required to be useful to find the file in question, only that
is unique within the given fs. Servers prepared to accept a fileid it is unique within the given filesystem. Servers prepared to accept
as a single piece of metadata and store it apart from the value used a fileid as a single piece of metadata and store it apart from the
to index the file information can relatively easily maintain a fileid value used to index the file information can relatively easily
value across a migration event, allowing a truly transparent maintain a fileid value across a migration event, allowing a truly
migration event. transparent migration event.
In any case, where servers can provide continuity of fileids, they In any case, where servers can provide continuity of fileids, they
should and the client should be able to find out that such continuity should, and the client should be able to find out that such
is available, and take appropriate action. Information about the continuity is available and take appropriate action. Information
continuity (or lack thereof) of fileid's across a file system is about the continuity (or lack thereof) of fileids across a file
represented by specifying whether the file systems in question are of system transition is represented by specifying whether the file
the same _fileid_ class. systems in question are of the same _fileid_ class.
Note that when consistent fileids do not exist across a transition
(either because there is no continuity of fileids or because fileid
is not a supported attribute on one of instances involved), and there
are no reliable filehandles across a transition event (either because
there is no filehandle continuity or because the filehandles are
volatile), the client is in a position where it cannot verify that
files it was accessing before the transition are the same objects.
It is forced to assume that no object has been renamed, and, unless
there are guarantees that provide this (e.g. the filesystem is read-
only), problems for applications may occur. Therefore, use of such
configurations should be limited to situations where the problems
that this may cause can be tolerated.
11.6.5. Fsids and File System Transitions 11.6.5. Fsids and File System Transitions
Since fsids are only unique within a per-server basis, it is to be Since fsids are generally only unique within a per-server basis, it
expected that they will change during a file system transition. is likely that they will change during a file system transition. One
Clients should not make the fsid's received from the server visible exception is the case of transparent transitions, but in that case we
to application since they may not be globally unique, and because have multiple network addresses that are defined as the same server
they may change during a file system transition event. Applications (as specified by so_major_id field of eir_server_owner). Clients
are best served if they are isolated from such transitions to the should not make the fsids received from the server visible to
extent possible. applications since they may not be globally unique, and because they
may change during a file system transition event. Applications are
best served if they are isolated from such transitions to the extent
possible.
Although normally, a single source filesystem will transition to a
single target filesystem, there is a provision for splitting a single
source filesystem into multiple target filesystems, by specifying the
FSLI4F_MULTI_FS flag.
11.6.5.1. File System Splitting
When a file system transition is made and the fs_locations_info When a file system transition is made and the fs_locations_info
indicates that file system in question may be split into multiple indicates that the file system in question may be split into multiple
file systems (via the FSLI4F_MULTI_FS flag), client should do file systems (via the FSLI4F_MULTI_FS flag), the client SHOULD do
GETATTR's on all known objects within the file system undergoing GETATTRs to determine the fsid attribute on all known objects within
transition, to determine the new file system boundaries. Clients may the file system undergoing transition to determine the new file
maintain the fsid's passed to existing applications by mapping all of system boundaries.
the fsid for the descendent file systems to a the common fsid used
for the original file system. Clients may maintain the fsids passed to existing applications by
mapping all of the fsids for the descendent file systems to the
common fsid used for the original file system.
Splitting a filesystem may be done on a transition between
filesystems of the same _fileid_ class, since the fact that fileids
are unique within the source filesystem ensure they will be unique in
each of the target filesystems.
11.6.6. The Change Attribute and File System Transitions 11.6.6. The Change Attribute and File System Transitions
Since the change attribute is defined as a server-specific one, Since the change attribute is defined as a server-specific one,
change attributes fetched from one server are normally presumed to be change attributes fetched from one server are normally presumed to be
invalid on another server. Such a presumption is troublesome since invalid on another server. Such a presumption is troublesome since
it would invalidate all cached change attributes, requiring it would invalidate all cached change attributes, requiring
refetching. Even more disruptive, the absence of any assured refetching. Even more disruptive, the absence of any assured
continuity for the change attribute means that even if the same value continuity for the change attribute means that even if the same value
is gotten on refetch no conclusions can drawn as to whether the is gotten on refetch no conclusions can drawn as to whether the
object in question has changed. The identical change attribute could object in question has changed. The identical change attribute could
be merely an artifact, of a modified file with a different change be merely an artifact of a modified file with a different change
attribute construction algorithm, with that new algorithm just attribute construction algorithm, with that new algorithm just
happening to result in an identical change value. happening to result in an identical change value.
When the two file systems have consistent change attribute formats, When the two file systems have consistent change attribute formats,
and this fact is communicated to the client by reporting as in the and this fact is communicated to the client by reporting as in the
same _change_ class, the client may assume a continuity of change same _change_ class, the client may assume a continuity of change
attribute construction and handle this situation just as it would be attribute construction and handle this situation just as it would be
handled without any file system transition. handled without any file system transition.
11.6.7. Lock State and File System Transitions 11.6.7. Lock State and File System Transitions
In a file system transition, the client needs to handle cases in In a file system transition, the client needs to handle cases in
which the two servers have cooperated in state management and in which the two servers have cooperated in state management and in
which they have not. Cooperation by two servers in state management which they have not. Cooperation by two servers in state management
requires coordination of clientids. Before the client attempts to requires coordination of client IDs. Before the client attempts to
use a client ID associated with one server in a request to the server use a client ID associated with one server in a request to the server
of the other file system, it must eliminate the possibility that two of the other file system, it must eliminate the possibility that two
non-cooperating servers have assigned the same client ID by accident. non-cooperating servers have assigned the same client ID by accident.
The client needs to compare the eir_server_scope values returned by The client needs to compare the eir_server_scope values returned by
each server. If the scope values do not match, then the servers have each server. If the scope values do not match, then the servers have
not cooperated in state management. If the scope values match, then not cooperated in state management. If the scope values match, then
this indicates the servers have cooperated in assigning clientids to this indicates the servers have cooperated in assigning client IDs to
the point that they will reject clientids that refer to state they do the point that they will reject client IDs that refer to state they
not know about. do not know about.
In the case of migration, the servers involved in the migration of a In the case of migration, the servers involved in the migration of a
file system SHOULD transfer all server state from the original to the file system SHOULD transfer all server state from the original to the
new server. When this done, it must be done in a way that is new server. When this is done, it must be done in a way that is
transparent to the client. With replication, such a degree of common transparent to the client. With replication, such a degree of common
state is typically not the case. Clients, however should use the state is typically not the case. Clients, however should use the
information provided by the eir_server_scope returned by EXCHANGE_ID information provided by the eir_server_scope returned by EXCHANGE_ID
to determine whether such sharing may be in effect, rather than to determine whether such sharing may be in effect, rather than
making assumptions based on the reason for the transition. making assumptions based on the reason for the transition.
This state transfer will reduce disruption to the client when a file This state transfer will reduce disruption to the client when a file
system transition If the servers are successful in transferring all system transition occurs. If the servers are successful in
state, the client can attempt to establish sessions associated with transferring all state, the client can attempt to establish sessions
the client ID used for the source file system instance. If the associated with the client ID used for the source file system
server accepts that as a valid client ID, then the client may used instance. If the server accepts that as a valid client ID, then the
the existing stateid's associated with that client ID for the old client may use the existing stateids associated with that client ID
file system instance in connection with the that same client ID in for the old file system instance in connection with that same client
connection with the file system instance. ID in connection with the transitioned file system instance.
When the two servers belong to the same server scope, it does When the two servers belong to the same server scope, it does not
necessarily mean that when dealing with the transition, the client mean that when dealing with the transition, the client will not have
will not have to reclaim state. However it does mean that the client to reclaim state. However it does mean that the client may proceed
may proceed using his current client ID when establishing using his current client ID when establishing communication with the
communication with the new server and the new server will either new server and the new server will either recognize the client ID as
recognize the client ID as valid, or reject it, in which case locks valid, or reject it, in which case locks must be reclaimed by the
must be reclaimed by the client. client.
File systems co-operating in state management may actually share File systems co-operating in state management may actually share
state or simply divide the id space so as to recognize (and reject as state or simply divide the id space so as to recognize (and reject as
stale) each others state and clients id's. Servers which do share stale) each other's stateids and client IDs. Servers which do share
state may not do so under all conditions or at all times. The state may not do so under all conditions or at all times. The
requirement for the server is that if it cannot be sure in accepting requirement for the server is that if it cannot be sure in accepting
a client ID that it reflects the locks the client was given, it must a client ID that it reflects the locks the client was given, it must
treat all associated state as stale and report it as such to the treat all associated state as stale and report it as such to the
client. client.
When the two file systems instances are on servers that do not share When the two file system instances are on servers that do not share a
a server scope value the client must establish a new client ID on the server scope value the client must establish a new client ID on the
destination, if it does not have one already and reclaim if possible. destination, if it does not have one already, and reclaim locks if
In this case, old stateids and client ID's should not be presented to possible. In this case, old stateids and client ID's should not be
the new server since there is no assurance that they will not presented to the new server since there is no assurance that they
conflict with IDs valid on that server. will not conflict with IDs valid on that server.
In either case, when actual locks are not known to be maintained, the In either case, when actual locks are not known to be maintained, the
destination server may establish a grace period specific to the given destination server may establish a grace period specific to the given
file system, with non-reclaim locks being rejected for that file file system, with non-reclaim locks being rejected for that file
system, even though normal locks are being granted for other file system, even though normal locks are being granted for other file
systems. Clients should not infer the absence of a grace period for systems. Clients should not infer the absence of a grace period for
file systems being transitioned to a server from responses to file systems being transitioned to a server from responses to
requests for other file systems. requests for other file systems.
In the case of lock reclamation for a given file system after a file In the case of lock reclamation for a given file system after a file
system transition, edge conditions can arise similar to those for system transition, edge conditions can arise similar to those for
reclaim after server reboot (although in the case of the planned reclaim after server reboot (although in the case of the planned
state transfer associated with migration, these can be avoided by state transfer associated with migration, these can be avoided by
securely recording lock state as part of state migration. Where the securely recording lock state as part of state migration). Unless
destination server cannot guarantee that locks will not be the destination server can guarantee that locks will not be
incorrectly granted, the destination server should not establish a incorrectly granted, the destination server should not allow lock
file-system-specific grace period. reclaims and avoid establishing a grace period.
Once all locks have been reclaimed, or there were no locks to Once all locks have been reclaimed, or there were no locks to
reclaim, the client indicates that there are no more reclaims to be reclaim, the client indicates that there are no more reclaims to be
done for the filesystem in question by issuing a RECLAIM_COMPLETE done for the filesystem in question by issuing a RECLAIM_COMPLETE
operation with the one_fs paraneter set to true. Once this has been operation with the one_fs parameter set to true. Once this has been
done, non-reclaim locking operations may be done, and any subsequent done, non-reclaim locking operations may be done, and any subsequent
request to do reclaims will be rejected with the error request to do reclaims will be rejected with the error
NFS4ERR_NO_GRACE. NFS4ERR_NO_GRACE.
Information about client identity that may be propagated between Information about client identity may be propagated between servers
servers in the form of client_owner4 and associated verifiers, under in the form of client_owner4 and associated verifiers, under the
the assumption that the client presents the same values to all the assumption that the client presents the same values to all the
servers with which it deals. servers with which it deals.
Servers are encouraged to provide facilities to allow locks to be Servers are encouraged to provide facilities to allow locks to be
reclaimed on the new server after a file system transition. Often, reclaimed on the new server after a file system transition. Often,
however, in cases in which the two servers do not share a server however, in cases in which the two servers do not share a server
scope value, such facilities may not be available and client should scope value, such facilities may not be available and client should
be prepared to re-obtain locks, even though it is possible that the be prepared to re-obtain locks, even though it is possible that the
client may have his LOCK or OPEN request denied due to a conflicting client may have his LOCK or OPEN request denied due to a conflicting
lock. In some environments, such as the transition between read-only lock.
file systems, such denial of locks should not pose large difficulties
in practice. When an attempt to re-establish a lock on a new server The consequences of having no facilities available to reclaim locks
is denied, the client should treat the situation as if his original on the sew server will depend on the type of environment. In some
lock had been revoked. In all cases in which the lock is granted, environments, such as the transition between read-only file systems,
the client cannot assume that no conflicting could have been granted such denial of locks should not pose large difficulties in practice.
in the interim. Where change attribute continuity is present, the When an attempt to re-establish a lock on a new server is denied, the
client may check the change attribute to check for unwanted file client should treat the situation as if his original lock had been
revoked. Note that when the lock is granted, the client cannot
assume that no conflicting lock could have been granted in the
interim. Where change attribute continuity is present, the client
may check the change attribute to check for unwanted file
modifications. Where even this is not available, and the file system modifications. Where even this is not available, and the file system
is not read-only, a client may reasonably treat all pending locks as is not read-only, a client may reasonably treat all pending locks as
having been revoked. having been revoked.
11.6.7.1. Leases and File System Transitions 11.6.7.1. Leases and File System Transitions
In the case of lease renewal, the client may not be submitting In the case of lease renewal, the client may not be submitting
requests for a file system that has been transferred to another requests for a file system that has been transferred to another
server. This can occur because of the lease renewal mechanism. The server. This can occur because of the lease renewal mechanism. The
client renews leases for all file systems when submitting a request client renews the lease associated with all file systems when
on an associated session, regardless of the specific file system submitting a request on an associated session, regardless of the
being referenced. specific file system being referenced.
In order for the client to schedule renewal of leases that may have In order for the client to schedule renewal of leases where there is
been relocated to the new server, the client must find out about locking state that may have been relocated to the new server, the
lease relocation before those leases expire. To accomplish this, the client must find out about lease relocation before those leases
SEQUENCE operation will return the status bit expire. To accomplish this, the SEQUENCE operation will return the
SEQ4_STATUS_LEASE_MOVED, if responsibility for any of the leases to status bit SEQ4_STATUS_LEASE_MOVED, if responsibility for any of the
be renewed has been transferred to a new server. This condition will locking state renewed has been transferred to a new server. This
continue until the client receives an NFS4ERR_MOVED error and the will continue until the client receives an NFS4ERR_MOVED error for
server receives the subsequent GETATTR for the fs_locations or each of the filesystems for which there has been locking state
fs_locations_info attribute for an access to each file system for relocation.
which a lease has been moved to a new server.
When a client receives an SEQ4_STATUS_LEASE_MOVED indication, it When a client receives an SEQ4_STATUS_LEASE_MOVED indication, it
should perform an operation on each file system associated with the should perform an operation on each file system associated with the
server in question. When the client receives an NFS4ERR_MOVED error, server where there is locking state for the current client associated
the client can follow the normal process to obtain the new server with the filesystem in question. The client may choose to reference
information (through the fs_locations and fs_locations_info all filesystems in the interests of simplicity but what is important
attributes) and perform renewal of those leases on the new server, is that it must reference all filesystems for which there was locking
unless information in fs_locations_info attribute shows that no state state where that state moved. Once the client receives an
could have been transferred. If the server has not had state NFS4ERR_MOVED error for each filesystem, the SEQ4_STATUS_LEASE_MOVED
transferred to it transparently, the client will receive indication is cleared. The client can terminate the process of
NFS4ERR_STALE_CLIENTID from the new server, as described above, and checking filesystems once this indication is cleared, since there are
the client can then reclaim locks as is done in the event of server no others for which locking state has moved.
failure. [[Comment.8: Comment from Benny Halevy: server receives the
subsequent GETATTR for the fs_locations or 10959 fs_locations_info A client may use GETATTR of the fs_status (or fs_locations_info)
attribute for an access to each file system for 10960 which a lease attribute on all of the filesystems to get absence indications in a
has been moved to a new server. This paragraph is somewhat troubling single (or a few) request(s), since absent filesystems will not cause
as it says that the server may treat GETATTR as a state-changing an error in this context. However, it still must do an operation
operation but the this state may last indefinitely if the client does which receives NFS4ERR_MOVED on each filesystem, or order to clear
not query all file systems on the server. I think we need to provide the SEQ4_STATUS_LEASE_MOVED indication is cleared.
a more precise recommendation to the client implementation that will
deal with corner cases in this area. For example, the client knows Once the set of filesystems with transferred locking state has been
exactly which file systems it has state on (based on state it keeps determined, the client can follow the normal process to obtain the
in the client inode cache). When seeing SEQ4_STATUS_LEASE_MOVED it new server information (through the fs_locations and
can do the GETATTR on each of these file systems to see where they fs_locations_info attributes) and perform renewal of those leases on
were moved to. At this point the client and server should be back in the new server, unless information in fs_locations_info attribute
sync and the client can resume normal operation. If it still gets shows that no state could have been transferred. If the server has
SEQ4_STATUS_LEASE_MOVED and the state lingers (i.e. another scan of not had state transferred to it transparently, the client will
the file systems it knows of does not yield new NFS4ERR_MOVED receive NFS4ERR_STALE_CLIENTID from the new server, as described
indications) it can destroy the session to release all of its state above, and the client can then reclaim locks as is done in the event
on the server and get back in sync with the server. It should be of server failure.
said, however, that destroying the session clears the aforementioned
lease_moved "state" (if it indeed does so).]] [[Comment.9: Comment
from Trond: What does the error SEQ4_STATUS_LEASE_MOVED mean? A
lease is supposed to be global to the client, whereas fs_locations
returns information about a specific file system. What exactly is
the client expected to do if the original server exported 2 file
systems that are now being migrated to 2 different servers? ( [...]
I still don't see [for example ] what particular operation is the
client guaranteed to be able to perform on each file system?)]]
11.6.7.2. Transitions and the Lease_time Attribute 11.6.7.2. Transitions and the Lease_time Attribute
In order that the client may appropriately manage its leases in the In order that the client may appropriately manage its leases in the
case of a file system transition, the destination server must case of a file system transition, the destination server must
establish proper values for the lease_time attribute. establish proper values for the lease_time attribute.
When state is transferred transparently, that state should include When state is transferred transparently, that state should include
the correct value of the lease_time attribute. The lease_time the correct value of the lease_time attribute. The lease_time
attribute on the destination server must never be less than that on attribute on the destination server must never be less than that on
the source since this would result in premature expiration of leases the source since this would result in premature expiration of leases
granted by the source server. Upon transitions in which state is granted by the source server. Upon transitions in which state is
transferred transparently, the client is under no obligation to re- transferred transparently, the client is under no obligation to re-
fetch the lease_time attribute and may continue to use the value fetch the lease_time attribute and may continue to use the value
previously fetched (on the source server). previously fetched (on the source server).
If state has not been transferred transparently, either because the If state has not been transferred transparently, either because the
associated servers are show as have different eir_server_scope associated servers are shown as having different eir_server_scope
strings or because the client ID is rejected when presented to the strings or because the client ID is rejected when presented to the
new server, the client should fetch the value of lease_time on the new server, the client should fetch the value of lease_time on the
new (i.e. destination) server, and use it for subsequent locking new (i.e. destination) server, and use it for subsequent locking
requests. However the server must respect a grace period at least as requests. However the server must respect a grace period at least as
long as the lease_time on the source server, in order to ensure that long as the lease_time on the source server, in order to ensure that
clients have ample time to reclaim their lock before potentially clients have ample time to reclaim their lock before potentially
conflicting non-reclaimed locks are granted. conflicting non-reclaimed locks are granted.
11.6.8. Write Verifiers and File System Transitions 11.6.8. Write Verifiers and File System Transitions
In a file system transition, the two file systems may be clustered in In a file system transition, the two file systems may be clustered in
the handling of unstably written data. When this is the case, and the handling of unstably written data. When this is the case, and
the two file systems belong to the same _verifier_ class, valid the two file systems belong to the same _write-verifier_ class, write
verifiers from one system may be recognized by the other and verifiers returned from one system may be compared to those returned
superfluous writes avoided. There is no requirement that all valid by the other and superfluous writes avoided.
verifiers be recognized, but it cannot be the case that a verifier is
recognized as valid when it is not. [NOTE: We need to resolve the
issue of proper verifier scope].
When two file systems belong to different _verifier_ classes, the When two file systems belong to different _write-verifier_ classes,
client must assume that all unstable writes in existence at the time any verifier generated by one must not be compared to one provided by
file system transition, have been lost since there is no way the old the other. Instead, it should be treated as not equal even when the
verifier can recognized as valid (or not) on the target server. values are identical.
11.6.9. Readdir Cookies and Verifiers and File System Transitions
In a file system transition, the two file systems may be consistent
in their handling of READDIR cookies and verifiers. When this is the
case, and the two file systems belong to the same _readdir_ class,
READDIR cookies and verifiers from one system may be recognized by
the other and READDIR operations started on one server may be validly
continued on the other, simply by presenting the cookie and verifier
returned by a READDIR operation done on the first filesystem to the
second.
When two file systems belong to different _readdir_ classes, any
READDIR cookie and verifier generated by one is not valid on the
second, and must not be presented to that server by the client. The
client should act as if the verifier was rejected.
11.6.10. File System Data and File System Transitions
When multiple replicas exist and are used simultaneously or in
succession by a client, applications using them will normally expect
that they contain data the same data or data which is consistent with
the normal sorts of changes that are made by other clients updating
the data of the file system. (with metadata being the same to the
degree indicated by the fs_locations_info attribute). However, when
multiple filesystems are presented as replicas of one another, the
precise relationship between the data of one and the data of another
is not, as a general matter, specified by the NFSv4.1 protocol. It
is quite possible to present as replicas filesystems where the data
of those filesystems is sufficiently different that some applications
have problems dealing with the transition between replicas. The
namespace will typically be constructed so that applications can
choose an appropriate level of support, so that in one position in
the namespace a varied set of replicas will be listed while in
another only those that are up-to-date may be considered replicas.
The protocol does define three special cases of the relationship
among replicas to be specified by the server and relied upon by
clients:
o When multiple server addresses correspond to the same actual
server, as shown by a common so_major_id field within the
eir_server_owner field returned by EXCHANGE_ID, the client may
depend on the fact that changes to data, metadata, or locks made
on one filesystem are immediately reflected on others.
o When multiple replicas exist and are used simultaneously by a
client(see the FSLIB4_CLSIMUL definition within
fs_locations_info), they must designate the same data. Where file
systems are writable, a change made on one instance must be
visible on all instances, immediately upon the earlier of the
return of the modifying requestor or the visibility of that change
on any of the associated replicas. This allows a client to use
these replicas simultaneously without any special adaptation to
the fact that there are multiple replicas. In this case, locks,
whether shared or byte-range, and delegations obtained one replica
are immediately reflected on all replicas, even though these locks
will be managed under a set of client IDs.
o When one replica is designated as the successor instance to
another existing instance after return NFS4ERR_MOVED (i.e. the
case of migration), the client may depend on the fact that all
changes securely made to data (uncommitted writes are dealt with
in Section 11.6.8) on the original instance are made to the
successor image.
o Where a file system is not writable but represents a read-only
copy (possibly periodically updated) of a writable file system,
clients have similar requirements with regard to the propagation
of updates. They may need a guarantee that any change visible on
the original file system instance must be immediately visible on
any replica before the client transitions access to that replica,
in order to avoid any possibility that a client, in effecting a
transition to a replica, will see any reversion in file system
state. The specific means by which this will be prevented varies
based on fs4_status_type reported as part of the fs_status
attribute (See Section 11.10). Since these filesystems are
presumed not to be suitable for simultaneous use, there is no
specification of how locking is handled and it generally will be
the case that locks obtained one filesystem will be separate from
those on others. Since these are going to be read-only
filesystems, this is not expected to pose an issue for clients or
applications.
11.7. Effecting File System Referrals 11.7. Effecting File System Referrals
Referrals are effected when an absent file system is encountered, and Referrals are effected when an absent file system is encountered, and
one or more alternate locations are made available by the one or more alternate locations are made available by the
fs_locations or fs_locations_info attributes. The client will fs_locations or fs_locations_info attributes. The client will
typically get an NFS4ERR_MOVED error, fetch the appropriate location typically get an NFS4ERR_MOVED error, fetch the appropriate location
information and proceed to access the file system on different information and proceed to access the file system on a different
server, even though it retains its logical position within the server, even though it retains its logical position within the
original namespace. original namespace. Referrals differ from migration events in that
they happen only when the client has not previously referenced the
file system in question (so there is nothing to transition).
Referrals can only come into effect when an absent file system is
encountered at its root.
The examples given in the sections below are somewhat artificial in The examples given in the sections below are somewhat artificial in
that an actual client will not typically do a multi-component lookup, that an actual client will not typically do a multi-component lookup,
but will have cached information regarding the upper levels of the but will have cached information regarding the upper levels of the
name hierarchy. However, these example are chosen to make the name hierarchy. However, these example are chosen to make the
required behavior clear and easy to put within the scope of a small required behavior clear and easy to put within the scope of a small
number of requests, without getting unduly into details of how number of requests, without getting unduly into details of how
specific clients might choose to cache things. specific clients might choose to cache things.
11.7.1. Referral Example (LOOKUP) 11.7.1. Referral Example (LOOKUP)
skipping to change at page 217, line 20 skipping to change at page 222, line 26
o LOOKUP "this" o LOOKUP "this"
o LOOKUP "is" o LOOKUP "is"
o LOOKUP "the" o LOOKUP "the"
o LOOKUP "path" o LOOKUP "path"
o GETFH o GETFH
o GETATTR fsid,fileid,size,ctime o GETATTR fsid,fileid,size,time_modify
Under the given circumstances, the following will be the result. Under the given circumstances, the following will be the result.
o PUTROOTFH --> NFS_OK. The current fh is now the root of the o PUTROOTFH --> NFS_OK. The current fh is now the root of the
pseudo-fs. pseudo-fs.
o LOOKUP "this" --> NFS_OK. The current fh is for /this and is o LOOKUP "this" --> NFS_OK. The current fh is for /this and is
within the pseudo-fs. within the pseudo-fs.
o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is
within the pseudo-fs. within the pseudo-fs.
o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and
is within the pseudo-fs. is within the pseudo-fs.
o LOOKUP "path" --> NFS_OK. The current fh is for /this/is/the/path o LOOKUP "path" --> NFS_OK. The current fh is for /this/is/the/path
and is within a new, absent fs, but ... the client will never see and is within a new, absent filesystem, but ... the client will
the value of that fh. never see the value of that fh.
o GETFH --> NFS4ERR_MOVED. Fails because current fh is in an absent o GETFH --> NFS4ERR_MOVED. Fails because current fh is in an absent
fs at the start of the operation and the spec makes no exception filesystem at the start of the operation and the spec makes no
for GETFH. exception for GETFH.
o GETATTR fsid,fileid,size,ctime. Not executed because the failure o GETATTR fsid,fileid,size,time_modify. Not executed because the
of the GETFH stops processing of the COMPOUND. failure of the GETFH stops processing of the COMPOUND.
Given the failure of the GETFH, the client has the job of determining Given the failure of the GETFH, the client has the job of determining
the root of the absent file system and where to find that file the root of the absent file system and where to find that file
system, i.e. the server and path relative to that server's root fh. system, i.e. the server and path relative to that server's root fh.
Note here that in this example, the client did not obtain filehandles Note here that in this example, the client did not obtain filehandles
and attribute information (e.g. fsid) for the intermediate and attribute information (e.g. fsid) for the intermediate
directories, so that he would not be sure where the absent file directories, so that he would not be sure where the absent file
system starts. It could be the case, for example, that /this/is/the system starts. It could be the case, for example, that /this/is/the
is the root of the moved file system and that the reason that the is the root of the moved file system and that the reason that the
lookup of "path" succeeded is that the file system was not absent on lookup of "path" succeeded is that the file system was not absent on
that op but was moved between the last LOOKUP and the GETFH (since that op but was moved between the last LOOKUP and the GETFH (since
COMPOUND is not atomic). Even if we had the fsid's for all of the COMPOUND is not atomic). Even if we had the fsids for all of the
intermediate directories, we could have no way of knowing that /this/ intermediate directories, we could have no way of knowing that /this/
is/the/path was the root of a new fs, since we don't yet have its is/the/path was the root of a new filesystem, since we don't yet have
fsid. its fsid.
In order to get the necessary information, let us re-issue the chain In order to get the necessary information, let us re-issue the chain
of lookup's with GETFH's and GETATTR's to at least get the fsid's so of LOOKUPs with GETFHs and GETATTRs to at least get the fsids so we
we can be sure where the appropriate fs boundaries are. The client can be sure where the appropriate filesystem boundaries are. The
could choose to get fs_locations_info at the same time but in most client could choose to get fs_locations_info at the same time but in
cases the client will have a good guess as to where fs boundaries are most cases the client will have a good guess as to where fs
(because of where NFS4ERR_MOVED was gotten and where not) making boundaries are (because of where NFS4ERR_MOVED was gotten and where
fetching of fs_locations_info unnecessary. not) making fetching of fs_locations_info unnecessary.
OP01: PUTROOTFH --> NFS_OK OP01: PUTROOTFH --> NFS_OK
- Current fh is root of pseudo-fs. - Current fh is root of pseudo-fs.
OP02: GETATTR(fsid) --> NFS_OK OP02: GETATTR(fsid) --> NFS_OK
- Just for completeness. Normally, clients will know the fsid of - Just for completeness. Normally, clients will know the fsid of
the pseudo-fs as soon as they establish communication with a the pseudo-fs as soon as they establish communication with a
server. server.
OP03: LOOKUP "this" --> NFS_OK OP03: LOOKUP "this" --> NFS_OK
OP04: GETATTR(fsid) --> NFS_OK OP04: GETATTR(fsid) --> NFS_OK
- Get current fsid to see where fs boundaries are. The fsid will be - Get current fsid to see where filesystem boundaries are. The fsid
that for the pseudo-fs in this example, so no boundary. will be that for the pseudo-fs in this example, so no boundary.
OP05: GETFH --> NFS_OK OP05: GETFH --> NFS_OK
- Current fh is for /this and is within pseudo-fs. - Current fh is for /this and is within pseudo-fs.
OP06: LOOKUP "is" --> NFS_OK OP06: LOOKUP "is" --> NFS_OK
- Current fh is for /this/is and is within pseudo-fs. - Current fh is for /this/is and is within pseudo-fs.
OP07: GETATTR(fsid) --> NFS_OK OP07: GETATTR(fsid) --> NFS_OK
- Get current fsid to see where fs boundaries are. The fsid will be
that for the pseudo-fs in this example, so no boundary. - Get current fsid to see where filesystem boundaries are. The fsid
will be that for the pseudo-fs in this example, so no boundary.
OP08: GETFH --> NFS_OK OP08: GETFH --> NFS_OK
- Current fh is for /this/is and is within pseudo-fs. - Current fh is for /this/is and is within pseudo-fs.
OP09: LOOKUP "the" --> NFS_OK OP09: LOOKUP "the" --> NFS_OK
- Current fh is for /this/is/the and is within pseudo-fs. - Current fh is for /this/is/the and is within pseudo-fs.
OP10: GETATTR(fsid) --> NFS_OK OP10: GETATTR(fsid) --> NFS_OK
- Get current fsid to see where fs boundaries are. The fsid will be - Get current fsid to see where filesystem boundaries are. The fsid
that for the pseudo-fs in this example, so no boundary. will be that for the pseudo-fs in this example, so no boundary.
OP11: GETFH --> NFS_OK OP11: GETFH --> NFS_OK
- Current fh is for /this/is/the and is within pseudo-fs. - Current fh is for /this/is/the and is within pseudo-fs.
OP12: LOOKUP "path" --> NFS_OK OP12: LOOKUP "path" --> NFS_OK
- Current fh is for /this/is/the/path and is within a new, absent - Current fh is for /this/is/the/path and is within a new, absent
fs, but ... filesystem, but ...
- The client will never see the value of that fh - The client will never see the value of that fh
OP13: GETATTR(fsid, fs_locations_info) --> NFS_OK OP13: GETATTR(fsid, fs_locations_info) --> NFS_OK
- We are getting the fsid to know where the fs boundaries are. Note - We are getting the fsid to know where the filesystem boundaries
that the fsid we are given will not necessarily be preserved at are. Note that the fsid we are given will not necessarily be
the new location. That fsid might be different and in fact the preserved at the new location. That fsid might be different and
fsid we have for this fs might a valid fsid of a different fs on in fact the fsid we have for this filesystem might be a valid fsid
that new server. of a different filesystem on that new server.
- In this particular case, we are pretty sure anyway that what has - In this particular case, we are pretty sure anyway that what has
moved is /this/is/the/path rather than /this/is/the since we have moved is /this/is/the/path rather than /this/is/the since we have
the fsid of the latter and it is that of the pseudo-fs, which the fsid of the latter and it is that of the pseudo-fs, which
presumably cannot move. However, in other examples, we might not presumably cannot move. However, in other examples, we might not
have this kind of information to rely on (e.g. /this/is/the might have this kind of information to rely on (e.g. /this/is/the might
be a non-pseudo file system separate from /this/is/the/path), so be a non-pseudo file system separate from /this/is/the/path), so
we need to have another reliable source information on the we need to have another reliable source information on the
boundary of the fs which is moved. If, for example, the file boundary of the fs which is moved. If, for example, the file
system "/this/is" had moved we would have a case of migration system "/this/is" had moved we would have a case of migration
skipping to change at page 220, line 13 skipping to change at page 225, line 15
system was clear we could fetch fs_locations_info. system was clear we could fetch fs_locations_info.
- We are fetching fs_locations_info because the fact that we got an - We are fetching fs_locations_info because the fact that we got an
NFS4ERR_MOVED at this point means that it most likely that this is NFS4ERR_MOVED at this point means that it most likely that this is
a referral and we need the destination. Even if it is the case a referral and we need the destination. Even if it is the case
that "/this/is/the" is a file system which has migrated, we will that "/this/is/the" is a file system which has migrated, we will
still need the location information for that file system. still need the location information for that file system.
OP14: GETFH --> NFS4ERR_MOVED OP14: GETFH --> NFS4ERR_MOVED
- Fails because current fh is in an absent fs at the start of the - Fails because current fh is in an absent filesystem at the start
operation and the spec makes no exception for GETFH. Note that of the operation and the spec makes no exception for GETFH. Note
this has the happy consequence that we don't have to worry about that this means the server will never send the client a filehandle
the volatility or lack thereof of the fh. If the root of the fs from within an absent filesystem.
on the new location is a persistent fh, then we can assume that
this fh, which we never saw is a persistent fh, which, if we could
see it, would exactly match the new fh. At least, there is no
evidence to disprove that. On the other hand, if we find a
volatile root at the new location, then the filehandle which we
never saw must have been volatile or at least nobody can prove
otherwise.
Given the above, the client knows where the root of the absent file Given the above, the client knows where the root of the absent file
system is, by noting where the change of fsid occurred. The system is, by noting where the change of fsid occurred. The
fs_locations_info attribute also gives the client the actual location fs_locations_info attribute also gives the client the actual location
of the absent file system, so that the referral can proceed. The of the absent file system, so that the referral can proceed. The
server gives the client the bare minimum of information about the server gives the client the bare minimum of information about the
absent file system so that there will be very little scope for absent file system so that there will be very little scope for
problems of conflict between information sent by the referring server problems of conflict between information sent by the referring server
and information of the file system's home. No filehandles and very and information of the file system's home. No filehandles and very
few attributes are present on the referring server and the client can few attributes are present on the referring server and the client can
skipping to change at page 221, line 4 skipping to change at page 225, line 47
Suppose such a directory is read as follows: Suppose such a directory is read as follows:
o PUTROOTFH o PUTROOTFH
o LOOKUP "this" o LOOKUP "this"
o LOOKUP "is" o LOOKUP "is"
o LOOKUP "the" o LOOKUP "the"
o READDIR (fsid, size, ctime, mounted_on_fileid)
o READDIR (fsid, size, time_modify, mounted_on_fileid)
In this case, because rdattr_error is not requested, In this case, because rdattr_error is not requested,
fs_locations_info is not requested, and some of attributes cannot be fs_locations_info is not requested, and some of attributes cannot be
provided the result will be an NFS4ERR_MOVED error on the READDIR, provided, the result will be an NFS4ERR_MOVED error on the READDIR,
with the detailed results as follows: with the detailed results as follows:
o PUTROOTFH --> NFS_OK. The current fh is at the root of the o PUTROOTFH --> NFS_OK. The current fh is at the root of the
pseudo-fs. pseudo-fs.
o LOOKUP "this" --> NFS_OK. The current fh is for /this and is o LOOKUP "this" --> NFS_OK. The current fh is for /this and is
within the pseudo-fs. within the pseudo-fs.
o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is
within the pseudo-fs. within the pseudo-fs.
o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and
is within the pseudo-fs. is within the pseudo-fs.
o READDIR (fsid, size, ctime, mounted_on_fileid) --> NFS4ERR_MOVED. o READDIR (fsid, size, time_modify, mounted_on_fileid) -->
Note that the same error would have been returned if /this/is/the NFS4ERR_MOVED. Note that the same error would have been returned
had migrated, when in fact it is because the directory contains if /this/is/the had migrated, when in fact it is because the
the root of an absent fs. directory contains the root of an absent filesystem.
So now suppose that we reissue with rdattr_error: So now suppose that we reissue with rdattr_error:
o PUTROOTFH o PUTROOTFH
o LOOKUP "this" o LOOKUP "this"
o LOOKUP "is" o LOOKUP "is"
o LOOKUP "the" o LOOKUP "the"
o READDIR (rdattr_error, fsid, size, ctime, mounted_on_fileid) o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid)
The results will be: The results will be:
o PUTROOTFH --> NFS_OK. The current fh is at the root of the o PUTROOTFH --> NFS_OK. The current fh is at the root of the
pseudo-fs. pseudo-fs.
o LOOKUP "this" --> NFS_OK. The current fh is for /this and is o LOOKUP "this" --> NFS_OK. The current fh is for /this and is
within the pseudo-fs. within the pseudo-fs.
o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is
within the pseudo-fs. within the pseudo-fs.
o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and
is within the pseudo-fs. is within the pseudo-fs.
o READDIR (rdattr_error, fsid, size, ctime, mounted_on_fileid) --> o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid)
NFS_OK. The attributes for "path" will only contain rdattr_error --> NFS_OK. The attributes for "path" will only contain
with the value will be NFS4ERR_MOVED, together with an fsid value rdattr_error with the value NFS4ERR_MOVED, together with an fsid
and an a value for mounted_on_fileid. value and a value for mounted_on_fileid.
So suppose we do another READDIR to get fs_locations_info, although So suppose we do another READDIR to get fs_locations_info, although
we could have used a GETATTR directly, as in the previous section. we could have used a GETATTR directly, as in the previous section.
o PUTROOTFH o PUTROOTFH
o LOOKUP "this" o LOOKUP "this"
o LOOKUP "is" o LOOKUP "is"
o LOOKUP "the" o LOOKUP "the"
o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid,
size, ctime) size, time_modify)
The results would be: The results would be:
o PUTROOTFH --> NFS_OK. The current fh is at the root of the o PUTROOTFH --> NFS_OK. The current fh is at the root of the
pseudo-fs. pseudo-fs.
o LOOKUP "this" --> NFS_OK. The current fh is for /this and is o LOOKUP "this" --> NFS_OK. The current fh is for /this and is
within the pseudo-fs. within the pseudo-fs.
o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is
within the pseudo-fs. within the pseudo-fs.
o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and
is within the pseudo-fs. is within the pseudo-fs.
o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid,
size, ctime) --> NFS_OK. The attributes will be as shown below. size, time_modify) --> NFS_OK. The attributes will be as shown
below.
The attributes for "path" will only contain The attributes for "path" will only contain
o rdattr_error (value: NFS4ERR_MOVED) o rdattr_error (value: NFS_OK)
o fs_locations_info ) o fs_locations_info
o mounted_on_fileid (value: unique fileid within referring fs) o mounted_on_fileid (value: unique fileid within referring fs)
o fsid (value: unique value within referring server)
The attribute entry for "latest" will not contain size or ctime.
11.8. The Attribute fs_absent
In order to provide the client information about whether the current o fsid (value: unique value within referring server)
file system is present or absent, the fs_absent attribute may be
interrogated.
As noted above, this attribute, when supported, may be requested of The attribute entry for "path" will not contain size or time_modify
absent file systems without causing NFS4ERR_MOVED to be returned and because these attributes are not available within an absent
it should always be available. Servers are strongly urged to support filesystem.
this attribute on all file systems if they support it on any file
system.
11.9. The Attribute fs_locations 11.8. The Attribute fs_locations
The fs_locations attribute is structured in the following way: The fs_locations attribute is structured in the following way:
struct fs_location { struct fs_location {
utf8str_cis server<>; utf8str_cis server<>;
pathname4 rootpath; pathname4 rootpath;
}; };
struct fs_locations { struct fs_locations {
pathname4 fs_root; pathname4 fs_root;
fs_location locations<>; fs_location locations<>;
}; };
The fs_location struct is used to represent the location of a file The fs_location struct is used to represent the location of a file
system by providing a server name and the path to the root of the system by providing a server name and the path to the root of the
file system within that server's namespace. When a set of servers file system within that server's namespace. When a set of servers
have corresponding file systems at the same path within their have corresponding file systems at the same path within their
namespaces, an array of server names may be provided. An entry in namespaces, an array of server names may be provided. An entry in
the server array is an UTF8 string and represents one of a the server array is a UTF8 string and represents one of a traditional
traditional DNS host name, IPv4 address, or IPv6 address. It is not DNS host name, IPv4 address, or IPv6 address, or an zero-length
a requirement that all servers that share the same rootpath be listed string. A null string SHOULD be used to indicate the current address
in one fs_location struct. The array of server names is provided for being used for the RPC call. It is not a requirement that all
convenience. Servers that share the same rootpath may also be listed servers that share the same rootpath be listed in one fs_location
in separate fs_location entries in the fs_locations attribute. struct. The array of server names is provided for convenience.
Servers that share the same rootpath may also be listed in separate
fs_location entries in the fs_locations attribute.
The fs_locations struct and attribute contains an array of such The fs_locations struct and attribute contains an array of such
locations. Since the namespace of each server may be constructed locations. Since the namespace of each server may be constructed
differently, the "fs_root" field is provided. The path represented differently, the "fs_root" field is provided. The path represented
by fs_root represents the location of the file system in the current by fs_root represents the location of the file system in the current
server's namespace, i.e. that of the server from which the server's namespace, i.e. that of the server from which the
fs_locations attribute was obtained. The fs_root path is meant to fs_locations attribute was obtained. The fs_root path is meant to
aid the client by clearly referencing the root of the file system aid the client by clearly referencing the root of the file system
whose locations are being reported, no matter what object within the whose locations are being reported, no matter what object within the
current file system, the current filehandle designates. current file system the current filehandle designates. When the
fs_locations attribute is interrogated and there are no alternate
file system locations, the server SHOULD return a zero-length array
of fs_location structures, together with a valid fs_root.
As an example, suppose there is a replicated file system located at As an example, suppose there is a replicated file system located at
two servers (servA and servB). At servA, the file system is located two servers (servA and servB). At servA, the file system is located
at path "/a/b/c". At, servB the file system is located at path at path "/a/b/c". At, servB the file system is located at path
"/x/y/z". If the client were to obtain the fs_locations value for "/x/y/z". If the client were to obtain the fs_locations value for
the directory at "/a/b/c/d", it might not necessarily know that the the directory at "/a/b/c/d", it might not necessarily know that the
file system's root is located in servA's namespace at "/a/b/c". When file system's root is located in servA's namespace at "/a/b/c". When
the client switches to servB, it will need to determine that the the client switches to servB, it will need to determine that the
directory it first referenced at servA is now represented by the path directory it first referenced at servA is now represented by the path
"/x/y/z/d" on servB. To facilitate this, the fs_locations attribute "/x/y/z/d" on servB. To facilitate this, the fs_locations attribute
provided by servA would have a fs_root value of "/a/b/c" and two provided by servA would have a fs_root value of "/a/b/c" and two
entries in fs_locations. One entry in fs_locations will be for entries in fs_locations. One entry in fs_locations will be for
itself (servA) and the other will be for servB with a path of itself (servA) and the other will be for servB with a path of
"/x/y/z". With this information, the client is able to substitute "/x/y/z". With this information, the client is able to substitute
"/x/y/z" for the "/a/b/c" at the beginning of its access path and "/x/y/z" for the "/a/b/c" at the beginning of its access path and
construct "/x/y/z/d" to use for the new server. construct "/x/y/z/d" to use for the new server.
Since fs_locations attribute lacks information defining various Since fs_locations attribute lacks information defining various
attributes of the various file system choices presented, it should attributes of the various file system choices presented, it SHOULD
only be interrogated and used when fs_locations_info is not only be interrogated and used when fs_locations_info is not
available. When fs_locations is used, information about the specific available. When fs_locations is used, information about the specific
locations should be assumed based on the following rules. locations should be assumed based on the following rules.
The following rules are general and apply irrespective of the The following rules are general and apply irrespective of the
context. context.
o All listed file system instances should be considered as of the o All listed file system instances should be considered as of the
same _handle_ class, if and only if, the current fh_expire_type same _handle_ class, if and only if, the current fh_expire_type
attribute does not include the FH4_VOL_MIGRATION bit. Note that attribute does not include the FH4_VOL_MIGRATION bit. Note that
skipping to change at page 225, line 5 skipping to change at page 229, line 41
same _fileid_ class, if and only if, the fh_expire_type attribute same _fileid_ class, if and only if, the fh_expire_type attribute
indicates persistent filehandles and does not include the indicates persistent filehandles and does not include the
FH4_VOL_MIGRATION bit. Note that in the case of referral, fileid FH4_VOL_MIGRATION bit. Note that in the case of referral, fileid
issues do not apply since there can be no fileids known within the issues do not apply since there can be no fileids known within the
referring (absent) file system nor is there any access to the referring (absent) file system nor is there any access to the
fh_expire_type attribute. fh_expire_type attribute.
o All file system instances servers should be considered as of o All file system instances servers should be considered as of
different _change_ classes. different _change_ classes.
For other class assignments, handling depends of file system For other class assignments, handling of file system transitions
transitions depends on the reasons for the transition: depends on the reasons for the transition:
o When the transition is due to migration, the target should be o When the transition is due to migration, that is the client was
treated as being of the same _verifier_ class as the source. directed to new filesystem after receiving a NFS4ERR_MOVED error,
the target should be treated as being of the same _verifier_ class
as the source.
o When the transition is due to failover to another replica, the o When the transition is due to failover to another replica, that
target should be treated as being of a different _verifier_ class is, the client selected another replica without receiving and
from the source. NFS4ERR_MOVED error, the target should be treated as being of a
different _verifier_ class from the source.
The specific choices reflect typical implementation patterns for The specific choices reflect typical implementation patterns for
failover and controlled migration respectively. Since other choices failover and controlled migration respectively. Since other choices
are possible and useful, this information is better obtained by using are possible and useful, this information is better obtained by using
fs_locations_info. fs_locations_info. When a server implementation needs to communicate
other choices, it MUST support the fs_locations_info attribute.
See the section "Security Considerations" for a discussion on the See the section "Security Considerations" for a discussion on the
recommendations for the security flavor to be used by any GETATTR recommendations for the security flavor to be used by any GETATTR
operation that requests the "fs_locations" attribute. operation that requests the "fs_locations" attribute.
11.10. The Attribute fs_locations_info 11.9. The Attribute fs_locations_info
The fs_locations_info attribute is intended as a more functional The fs_locations_info attribute is intended as a more functional
replacement for fs_locations which will continue to exist and be replacement for fs_locations which will continue to exist and be
supported. Clients can use it get a more complete set of information supported. Clients can use it to get a more complete set of
about alternative file system locations. When the server does not information about alternative file system locations. When the server
support fs_locations_info, fs_locations can be used to get a subset does not support fs_locations_info, fs_locations can be used to get a
of the information. A server which supports fs_locations_info MUST subset of the information. A server which supports fs_locations_info
support fs_locations as well. MUST support fs_locations as well.
There is additional information present in fs_locations_info, that is There is additional information present in fs_locations_info, that is
not available in fs_locations: not available in fs_locations:
o Attribute continuity information to allow a client to select a o Attribute continuity information to allow a client to select a
location which meets the transparency requirements of the location which meets the transparency requirements of the
applications accessing the data and to take advantage of applications accessing the data and to take advantage of
optimizations that server guarantees as to attribute continuity optimizations that server guarantees as to attribute continuity
may provide (e.g. change attribute). may provide (e.g. change attribute).
o File System identity information which indicates when multiple o File System identity information which indicates when multiple
replicas, from the clients point of view, correspond to the same replicas, from the client's point of view, correspond to the same
target file system, allowing them to be used interchangeably, target file system, allowing them to be used interchangeably,
without disruption, as multiple paths to the same thing. without disruption, as multiple paths to the same thing.
o Information which will bear on the suitability of various o Information which will bear on the suitability of various
replicas, depending on the use that the client intends. For replicas, depending on the use that the client intends. For
example, many applications need an absolutely up-to-date copy example, many applications need an absolutely up-to-date copy
(e.g. those that write), while others may only need access to the (e.g. those that write), while others may only need access to the
most up-to-date copy reasonably available. most up-to-date copy reasonably available.
o Server-derived preference information for replicas, which can be o Server-derived preference information for replicas, which can be
used to implement load-balancing while giving the client the used to implement load-balancing while giving the client the
entire fs list to be used in case the primary fails. entire filesystem list to be used in case the primary fails.
The fs_locations_info attribute is structured similarly to the
fs_locations attribute. A top-level structure (fs_locations_info4)
contains the entire attribute including the root pathname of the
filesystem and an array of lower-level structures that define
replicas that share a common root path on their respective servers.
The lower-level structure in turn (fs_locations_item4) contains a
specific pathname and information on one or more individual server
replicas. For that last lowest-level fs_locations_info has a
fs_locations_server4 structure that contains per-server-replica
information in addition to the server name. This per-server-replica
information includes a nominally opaque array, fls_info, in which
specific pieces of information are located at the specific indices
listed below.
The attribute will always contains at least a single
fs_locations_server entry. Typically, this will be an entry with the
FS4LIGF_CUR_REQ flag set, although in the case of a referral there
will be no entry with that flag set.
It should be noted that fs_locations_info attributes returned by
servers for various replicas may different for various reasons. One
server may know about a set of replicas that are not know to other
servers. Further, compatibility attributes may differ. Filehandles
may by of the same class going from replica A to replica B but not
going in the reverse direction. This may happen because the
filehandles are the same but the server implementation for the server
on which replica B may not have provision to note and report that
equivalence.
The fs_locations_info attribute consists of a root pathname (just The fs_locations_info attribute consists of a root pathname (just
like fs_locations), together with an array of fs_location_item4 like fs_locations), together with an array of fs_location_item4
structures. structures. The fs_location_item4 structures in turn consist of a
root pathanme together with an array of
/*
* Defines an individual server replica
*/
struct fs_locations_server4 { struct fs_locations_server4 {
int32_t fls_currency; int32_t fls_currency;
opaque fls_info<>; opaque fls_info<>;
utf8str_cis fls_server; utf8str_cis fls_server;
}; };
/*
* Byte indices of items within fls_info: flag fields, class numbers,
* bytes indicating ranks and orders.
*/
const FSLI4BX_GFLAGS = 0; const FSLI4BX_GFLAGS = 0;
const FSLI4BX_TFLAGS = 1; const FSLI4BX_TFLAGS = 1;
const FSLI4BX_CLSIMUL = 2; const FSLI4BX_CLSIMUL = 2;
const FSLI4BX_CLHANDLE = 3; const FSLI4BX_CLHANDLE = 3;
const FSLI4BX_CLFILEID = 4; const FSLI4BX_CLFILEID = 4;
const FSLI4BX_CLVERIFIER = 5; const FSLI4BX_CLWRITEVER = 5;
const FSLI4BX_CHANGE = 6; const FSLI4BX_CLCHANGE = 6;
const FSLI4BX_CLREADDIR = 7;
const FSLI4BX_READRANK = 7; const FSLI4BX_READRANK = 8;
const FSLI4BX_WRITERANK = 8; const FSLI4BX_WRITERANK = 9;
const FSLI4BX_READORDER = 9; const FSLI4BX_READORDER = 10;
const FSLI4BX_WRITEORDER = 10; const FSLI4BX_WRITEORDER = 11;
/*
* Bits defined within the general flag byte.
*/
const FSLI4GF_WRITABLE = 0x01; const FSLI4GF_WRITABLE = 0x01;
const FSLI4GF_CUR_REQ = 0x02; const FSLI4GF_CUR_REQ = 0x02;
const FSLI4GF_ABSENT = 0x04; const FSLI4GF_ABSENT = 0x04;
const FSLI4GF_GOING = 0x08; const FSLI4GF_GOING = 0x08;
const FSLI4GF_SPLIT = 0x10; const FSLI4GF_SPLIT = 0x10;
/*
* Bits defined within the transport flag byte.
*/
const FSLI4TF_RDMA = 0x01; const FSLI4TF_RDMA = 0x01;
/*
* Defines a set of replicas sharing a common value of the root
* path with in the corresponding single-server namespaces.
*/
struct fs_locations_item4 { struct fs_locations_item4 {
fs_locations_server4 fli_entries<>; fs_locations_server4 fli_entries<>;
pathname4 fli_rootpath; pathname4 fli_rootpath;
}; };
/*
* Defines the overall structure of the fs_locations_info attribute.
*/
struct fs_locations_info4 { struct fs_locations_info4 {
uint32_t fli_flags; uint32_t fli_flags;
int32_t fli_valid_for;
pathname4 fli_fs_root; pathname4 fli_fs_root;
fs_locations_item4 fli_items<>; fs_locations_item4 fli_items<>;
}; };
/*
* Flag bits in fli_flags.
*/
const FSLI4IF_VAR_SUB = 0x00000001; const FSLI4IF_VAR_SUB = 0x00000001;
typedef fs_locations_info4 fattr4_fs_locations_info; typedef fs_locations_info4 fattr4_fs_locations_info;
The fs_locations_info attribute is structured similarly to the
fs_locations attribute. A top-level structure (fs_locations_info4)
contains the entire attribute including the root pathname of the fs
and an array of lower-level structures that define replicas that
share a common root path on their respective servers. The lower-
level structure in turn ( fs_locations_item4) contain a specific
pathname and information on one or more individual server replicas.
For that last lowest-level fs_locations_info has a
fs_locations_server4 structure that contains per-server-replica
information in addition to the server name.
As noted above, the fs_locations_info attribute, when supported, may As noted above, the fs_locations_info attribute, when supported, may
be requested of absent file systems without causing NFS4ERR_MOVED to be requested of absent file systems without causing NFS4ERR_MOVED to
be returned and it is generally expected that it will be available be returned and it is generally expected that it will be available
for both present and absent file systems even if only a single for both present and absent file systems even if only a single
fs_locations_server4 entry is present, designating the current fs_locations_server4 entry is present, designating the current
(present) file system, or two fs_locations_server4 entries (present) file system, or two fs_locations_server4 entries
designating the current (and now previous) location of an absent file designating the previous location of an absent file system (the one
system and its successor location. Servers are strongly urged to just referenced) and its successor location. Servers are strongly
support this attribute on all file systems if they support it on any urged to support this attribute on all file systems if they support
file system. it on any file system.
11.10.1. The fs_locations_server4 Structure The data presented in the fs_locations_info attribute may be obtained
by the server in any number of ways, including specification by the
administrator or by current protocols for transferring data among
replicas and protocols not yet developed. NFS version 4.1 only
defines how this information is presented by the server to the
client.
11.9.1. The fs_locations_server4 Structure
The fs_locations_server4 structure consists of the following items: The fs_locations_server4 structure consists of the following items:
o An indication of file system up-to-date-ness (fls_currency) in o An indication of file system up-to-date-ness (fls_currency) in
terms of approximate seconds before the present. A negative value terms of approximate seconds before the present. This value is
indicates that the server is unable to give any reasonably useful relative to the master copy. A negative value indicates that the
value here. A zero indicates that file system is the actual server is unable to give any reasonably useful value here. A zero
writable data or a reliably coherent and fully up-to-date copy. indicates that file system is the actual writable data or a
Positive values indicate how out- of-date this copy can normally reliably coherent and fully up-to-date copy. Positive values
be before it is considered for update. Such a value is not a indicate how out-of-date this copy can normally be before it is
guarantee that such updates will always be performed on the considered for update. Such a value is not a guarantee that such
required schedule but instead serve as a hint about how far behind updates will always be performed on the required schedule but
the most up-to-date copy of the data, this copy would normally be instead serve as a hint about how far the copy of the data would
expected to be. be expected to be behind the most up-to-date copy.
o A counted array of one-octet values (fls_info) containing o A counted array of one-octet values (fls_info) containing
information about the particular file system instance. This data information about the particular file system instance. This data
includes general flags, transport capability flags, file system includes general flags, transport capability flags, file system
equivalence class information, and selection priority information. equivalence class information, and selection priority information.
The encoding will be discussed below. The encoding will be discussed below.
o The server string (fls_server). For the case of the replica o The server string (fls_server). For the case of the replica
currently being accessed (via GETATTR), a null string may be used currently being accessed (via GETATTR), a null string MAY be used
to indicate the current address being used for the RPC call. to indicate the current address being used for the RPC call.
Data within the fls_info array, is in the form of 8-bit data items Data within the fls_info array is in the form of 8-bit data items
with constants giving the offsets within the array of various values with constants giving the offsets within the array of various values
describing this particular file system instance. This style of describing this particular file system instance. This style of
definition was chosen, in preference to explicit XDR structure definition was chosen, in preference to explicit XDR structure
definitions