draft-ietf-nfsv4-rfc3530bis-26.txt   draft-ietf-nfsv4-rfc3530bis-27.txt 
NFSv4 T. Haynes, Ed. NFSv4 T. Haynes, Ed.
Internet-Draft NetApp Internet-Draft NetApp
Obsoletes: 3530 (if approved) D. Noveck, Ed. Obsoletes: 3530 (if approved) D. Noveck, Ed.
Intended status: Standards Track EMC Intended status: Standards Track EMC
Expires: November 9, 2013 May 08, 2013 Expires: February 17, 2014 August 16, 2013
Network File System (NFS) Version 4 Protocol Network File System (NFS) Version 4 Protocol
draft-ietf-nfsv4-rfc3530bis-26.txt draft-ietf-nfsv4-rfc3530bis-27.txt
Abstract Abstract
The Network File System (NFS) version 4 is a distributed filesystem The Network File System (NFS) version 4 is a distributed file system
protocol which owes heritage to NFS protocol version 2, RFC 1094, and protocol which builds on the heritage of NFS protocol version 2, RFC
version 3, RFC 1813. Unlike earlier versions, the NFS version 4 1094, and version 3, RFC 1813. Unlike earlier versions, the NFS
protocol supports traditional file access while integrating support version 4 protocol supports traditional file access while integrating
for file locking and the mount protocol. In addition, support for support for file locking and the mount protocol. In addition,
strong security (and its negotiation), compound operations, client support for strong security (and its negotiation), compound
caching, and internationalization have been added. Of course, operations, client caching, and internationalization have been added.
attention has been applied to making NFS version 4 operate well in an Of course, attention has been applied to making NFS version 4 operate
Internet environment. well in an Internet environment.
This document, together with the companion XDR description document, This document, together with the companion XDR description document,
RFCNFSv4XDR, obsoletes RFC 3530 as the definition of the NFS version RFCNFSv4XDR, obsoletes RFC 3530 as the definition of the NFS version
4 protocol. 4 protocol.
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
skipping to change at page 1, line 49 skipping to change at page 1, line 49
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 9, 2013. This Internet-Draft will expire on February 17, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 7 skipping to change at page 3, line 7
modifications of such material outside the IETF Standards Process. modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 9 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1. NFS Version 4 Goals . . . . . . . . . . . . . . . . . . 9 1.1. NFS Version 4 Goals . . . . . . . . . . . . . . . . . . 8
1.2. Inconsistencies of this Document with the companion 1.2. Definitions in the companion document NFS Version 4
document NFS Version 4 Protocol . . . . . . . . . . . . 10 Protocol are Authoritative . . . . . . . . . . . . . . . 8
1.3. Overview of NFSv4 Features . . . . . . . . . . . . . . . 10 1.3. Overview of NFSv4 Features . . . . . . . . . . . . . . . 9
1.3.1. RPC and Security . . . . . . . . . . . . . . . . . . 10 1.3.1. RPC and Security . . . . . . . . . . . . . . . . . . 9
1.3.2. Procedure and Operation Structure . . . . . . . . . 10 1.3.2. Procedure and Operation Structure . . . . . . . . . 9
1.3.3. Filesystem Model . . . . . . . . . . . . . . . . . . 11 1.3.3. Filesystem Model . . . . . . . . . . . . . . . . . . 10
1.3.4. OPEN and CLOSE . . . . . . . . . . . . . . . . . . . 13 1.3.4. OPEN and CLOSE . . . . . . . . . . . . . . . . . . . 12
1.3.5. File Locking . . . . . . . . . . . . . . . . . . . . 13 1.3.5. File Locking . . . . . . . . . . . . . . . . . . . . 12
1.3.6. Client Caching and Delegation . . . . . . . . . . . 14 1.3.6. Client Caching and Delegation . . . . . . . . . . . 12
1.4. General Definitions . . . . . . . . . . . . . . . . . . 14 1.4. General Definitions . . . . . . . . . . . . . . . . . . 13
1.5. Changes since RFC 3530 . . . . . . . . . . . . . . . . . 16 1.5. Changes since RFC 3530 . . . . . . . . . . . . . . . . . 15
1.6. Changes since RFC 3010 . . . . . . . . . . . . . . . . . 17 1.6. Changes since RFC 3010 . . . . . . . . . . . . . . . . . 16
2. Protocol Data Types . . . . . . . . . . . . . . . . . . . . . 18 2. Protocol Data Types . . . . . . . . . . . . . . . . . . . . . 17
2.1. Basic Data Types . . . . . . . . . . . . . . . . . . . . 18 2.1. Basic Data Types . . . . . . . . . . . . . . . . . . . . 17
2.2. Structured Data Types . . . . . . . . . . . . . . . . . 20 2.2. Structured Data Types . . . . . . . . . . . . . . . . . 19
3. RPC and Security Flavor . . . . . . . . . . . . . . . . . . . 24 3. RPC and Security Flavor . . . . . . . . . . . . . . . . . . . 23
3.1. Ports and Transports . . . . . . . . . . . . . . . . . . 25 3.1. Ports and Transports . . . . . . . . . . . . . . . . . . 23
3.1.1. Client Retransmission Behavior . . . . . . . . . . . 26 3.1.1. Client Retransmission Behavior . . . . . . . . . . . 24
3.2. Security Flavors . . . . . . . . . . . . . . . . . . . . 26 3.2. Security Flavors . . . . . . . . . . . . . . . . . . . . 25
3.2.1. Security mechanisms for NFSv4 . . . . . . . . . . . 27 3.2.1. Security mechanisms for NFSv4 . . . . . . . . . . . 25
3.3. Security Negotiation . . . . . . . . . . . . . . . . . . 28 3.3. Security Negotiation . . . . . . . . . . . . . . . . . . 26
3.3.1. SECINFO . . . . . . . . . . . . . . . . . . . . . . 28 3.3.1. SECINFO . . . . . . . . . . . . . . . . . . . . . . 27
3.3.2. Security Error . . . . . . . . . . . . . . . . . . . 29 3.3.2. Security Error . . . . . . . . . . . . . . . . . . . 27
3.3.3. Callback RPC Authentication . . . . . . . . . . . . 29 3.3.3. Callback RPC Authentication . . . . . . . . . . . . 27
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 30 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 30 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 28
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 31 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 29
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 31 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 29
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 31 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 30
4.2.1. General Properties of a Filehandle . . . . . . . . . 32 4.2.1. General Properties of a Filehandle . . . . . . . . . 30
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 32 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 31
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 33 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 31
4.2.4. One Method of Constructing a Volatile Filehandle . . 34 4.2.4. One Method of Constructing a Volatile Filehandle . . 32
4.3. Client Recovery from Filehandle Expiration . . . . . . . 34 4.3. Client Recovery from Filehandle Expiration . . . . . . . 33
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 35 5. Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 36 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 35
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 37 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 35
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 37 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 36
5.4. Classification of Attributes . . . . . . . . . . . . . . 39 5.4. Classification of Attributes . . . . . . . . . . . . . . 37
5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 39 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 38
5.6. REQUIRED Attributes - List and Definition References . . 40 5.6. REQUIRED Attributes - List and Definition References . . 38
5.7. RECOMMENDED Attributes - List and Definition 5.7. RECOMMENDED Attributes - List and Definition
References . . . . . . . . . . . . . . . . . . . . . . . 41 References . . . . . . . . . . . . . . . . . . . . . . . 39
5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 42 5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 40
5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 42 5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 40
5.8.2. Definitions of Uncategorized RECOMMENDED 5.8.2. Definitions of Uncategorized RECOMMENDED
Attributes . . . . . . . . . . . . . . . . . . . . . 44 Attributes . . . . . . . . . . . . . . . . . . . . . 42
5.9. Interpreting owner and owner_group . . . . . . . . . . . 50 5.9. Interpreting owner and owner_group . . . . . . . . . . . 48
5.10. Character Case Attributes . . . . . . . . . . . . . . . 53 5.10. Character Case Attributes . . . . . . . . . . . . . . . 51
6. Access Control Attributes . . . . . . . . . . . . . . . . . . 53 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 51
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 54 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 52
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 54 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 52
6.2.2. Attribute 33: mode . . . . . . . . . . . . . . . . . 68 6.2.2. Attribute 33: mode . . . . . . . . . . . . . . . . . 67
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 69 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 67
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 69 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 67
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 70 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 68
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 71 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 69
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 72 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 70
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 73 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 71
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 73 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 71
7. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 75 7. NFS Server Name Space . . . . . . . . . . . . . . . . . . . . 73
7.1. Location Attributes . . . . . . . . . . . . . . . . . . 75 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 73
7.2. File System Presence or Absence . . . . . . . . . . . . 76 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 73
7.3. Getting Attributes for an Absent File System . . . . . . 77 7.3. Server Pseudo Filesystem . . . . . . . . . . . . . . . . 74
7.3.1. GETATTR Within an Absent File System . . . . . . . . 77 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 74
7.3.2. READDIR and Absent File Systems . . . . . . . . . . 78 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 75
7.4. Uses of Location Information . . . . . . . . . . . . . . 78 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 75
7.4.1. File System Replication . . . . . . . . . . . . . . 79 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 75
7.4.2. File System Migration . . . . . . . . . . . . . . . 80 7.8. Security Policy and Name Space Presentation . . . . . . 76
7.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 81 8. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 76
7.5. Location Entries and Server Identity . . . . . . . . . . 81 8.1. Location Attributes . . . . . . . . . . . . . . . . . . 77
7.6. Additional Client-Side Considerations . . . . . . . . . 82 8.2. File System Presence or Absence . . . . . . . . . . . . 77
7.7. Effecting File System Transitions . . . . . . . . . . . 83 8.3. Getting Attributes for an Absent File System . . . . . . 78
7.7.1. File System Transitions and Simultaneous Access . . 84 8.3.1. GETATTR Within an Absent File System . . . . . . . . 78
7.7.2. Filehandles and File System Transitions . . . . . . 85 8.3.2. READDIR and Absent File Systems . . . . . . . . . . 79
7.7.3. Fileids and File System Transitions . . . . . . . . 85 8.4. Uses of Location Information . . . . . . . . . . . . . . 80
7.7.4. Fsids and File System Transitions . . . . . . . . . 86 8.4.1. File System Replication . . . . . . . . . . . . . . 81
7.7.5. The Change Attribute and File System Transitions . . 87 8.4.2. File System Migration . . . . . . . . . . . . . . . 81
7.7.6. Lock State and File System Transitions . . . . . . . 87 8.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 82
7.7.7. Write Verifiers and File System Transitions . . . . 89 8.5. Location Entries and Server Identity . . . . . . . . . . 83
7.7.8. Readdir Cookies and Verifiers and File System 8.6. Additional Client-Side Considerations . . . . . . . . . 83
Transitions . . . . . . . . . . . . . . . . . . . . 89 8.7. Effecting File System Referrals . . . . . . . . . . . . 84
7.7.9. File System Data and File System Transitions . . . . 90 8.7.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 85
7.8. Effecting File System Referrals . . . . . . . . . . . . 91 8.7.2. Referral Example (READDIR) . . . . . . . . . . . . . 89
7.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 91 8.8. The Attribute fs_locations . . . . . . . . . . . . . . . 91
7.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 95 8.8.1. Inferring Transition Modes . . . . . . . . . . . . . 93
7.9. The Attribute fs_locations . . . . . . . . . . . . . . . 98 9. File Locking and Share Reservations . . . . . . . . . . . . . 94
7.9.1. Inferring Transition Modes . . . . . . . . . . . . . 99 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 95
8. NFS Server Name Space . . . . . . . . . . . . . . . . . . . . 101 9.1.1. Client ID . . . . . . . . . . . . . . . . . . . . . 96
8.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 101 9.1.2. Server Release of Client ID . . . . . . . . . . . . 99
8.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 101 9.1.3. Stateid Definition . . . . . . . . . . . . . . . . . 99
8.3. Server Pseudo Filesystem . . . . . . . . . . . . . . . . 101 9.1.4. lock-owner . . . . . . . . . . . . . . . . . . . . . 105
8.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 102 9.1.5. Use of the Stateid and Locking . . . . . . . . . . . 106
8.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 102 9.1.6. Sequencing of Lock Requests . . . . . . . . . . . . 108
8.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 102 9.1.7. Recovery from Replayed Requests . . . . . . . . . . 109
8.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 103 9.1.8. Interactions of multiple sequence values . . . . . . 109
8.8. Security Policy and Name Space Presentation . . . . . . 103 9.1.9. Releasing state-owner State . . . . . . . . . . . . 110
9. File Locking and Share Reservations . . . . . . . . . . . . . 104 9.1.10. Use of Open Confirmation . . . . . . . . . . . . . . 111
9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 105 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 112
9.1.1. Client ID . . . . . . . . . . . . . . . . . . . . . 105 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 113
9.1.2. Server Release of Client ID . . . . . . . . . . . . 108 9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 113
9.1.3. Stateid Definition . . . . . . . . . . . . . . . . . 109 9.5. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 114
9.1.4. lock-owner . . . . . . . . . . . . . . . . . . . . . 115 9.6. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 115
9.1.5. Use of the Stateid and Locking . . . . . . . . . . . 116 9.6.1. Client Failure and Recovery . . . . . . . . . . . . 115
9.1.6. Sequencing of Lock Requests . . . . . . . . . . . . 118 9.6.2. Server Failure and Recovery . . . . . . . . . . . . 116
9.1.7. Recovery from Replayed Requests . . . . . . . . . . 119 9.6.3. Network Partitions and Recovery . . . . . . . . . . 117
9.1.8. Interactions of multiple sequence values . . . . . . 119 9.7. Recovery from a Lock Request Timeout or Abort . . . . . 125
9.1.9. Releasing state-owner State . . . . . . . . . . . . 120 9.8. Server Revocation of Locks . . . . . . . . . . . . . . . 125
9.1.10. Use of Open Confirmation . . . . . . . . . . . . . . 121 9.9. Share Reservations . . . . . . . . . . . . . . . . . . . 127
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 122 9.10. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 127
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 122 9.10.1. Close and Retention of State Information . . . . . . 128
9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 123 9.11. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 129
9.5. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 124 9.12. Short and Long Leases . . . . . . . . . . . . . . . . . 130
9.6. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 125
9.6.1. Client Failure and Recovery . . . . . . . . . . . . 125
9.6.2. Server Failure and Recovery . . . . . . . . . . . . 125
9.6.3. Network Partitions and Recovery . . . . . . . . . . 127
9.7. Recovery from a Lock Request Timeout or Abort . . . . . 135
9.8. Server Revocation of Locks . . . . . . . . . . . . . . . 135
9.9. Share Reservations . . . . . . . . . . . . . . . . . . . 136
9.10. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 137
9.10.1. Close and Retention of State Information . . . . . . 138
9.11. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 139
9.12. Short and Long Leases . . . . . . . . . . . . . . . . . 139
9.13. Clocks, Propagation Delay, and Calculating Lease 9.13. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 140 Expiration . . . . . . . . . . . . . . . . . . . . . . . 130
9.14. Migration, Replication and State . . . . . . . . . . . . 140 9.14. Migration, Replication and State . . . . . . . . . . . . 131
9.14.1. Migration and State . . . . . . . . . . . . . . . . 141 9.14.1. Migration and State . . . . . . . . . . . . . . . . 131
9.14.2. Replication and State . . . . . . . . . . . . . . . 142 9.14.2. Replication and State . . . . . . . . . . . . . . . 132
9.14.3. Notification of Migrated Lease . . . . . . . . . . . 142 9.14.3. Notification of Migrated Lease . . . . . . . . . . . 132
9.14.4. Migration and the Lease_time Attribute . . . . . . . 143 9.14.4. Migration and the Lease_time Attribute . . . . . . . 133
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 143 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 134
10.1. Performance Challenges for Client-Side Caching . . . . . 144 10.1. Performance Challenges for Client-Side Caching . . . . . 134
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 145 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 135
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 147 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 137
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 151 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 141
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 151 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 142
10.3.2. Data Caching and File Locking . . . . . . . . . . . 152 10.3.2. Data Caching and File Locking . . . . . . . . . . . 143
10.3.3. Data Caching and Mandatory File Locking . . . . . . 154 10.3.3. Data Caching and Mandatory File Locking . . . . . . 144
10.3.4. Data Caching and File Identity . . . . . . . . . . . 154 10.3.4. Data Caching and File Identity . . . . . . . . . . . 145
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 155 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 146
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 158 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 148
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 159 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 149
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 159 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 150
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 162 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 153
10.4.5. OPEN Delegation Race with CB_RECALL . . . . . . . . 164 10.4.5. OPEN Delegation Race with CB_RECALL . . . . . . . . 155
10.4.6. Clients that Fail to Honor Delegation Recalls . . . 165 10.4.6. Clients that Fail to Honor Delegation Recalls . . . 155
10.4.7. Delegation Revocation . . . . . . . . . . . . . . . 166 10.4.7. Delegation Revocation . . . . . . . . . . . . . . . 156
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 166 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 157
10.5.1. Revocation Recovery for Write Open Delegation . . . 167 10.5.1. Revocation Recovery for Write Open Delegation . . . 157
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 168
10.7. Data and Metadata Caching and Memory Mapped Files . . . 170 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 158
10.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 172 10.7. Data and Metadata Caching and Memory Mapped Files . . . 160
10.9. Directory Caching . . . . . . . . . . . . . . . . . . . 173 10.8. Name Caching . . . . . . . . . . . . . . . . . . . . . . 162
11. Minor Versioning . . . . . . . . . . . . . . . . . . . . . . 174 10.9. Directory Caching . . . . . . . . . . . . . . . . . . . 163
12. Internationalization . . . . . . . . . . . . . . . . . . . . 176 11. Minor Versioning . . . . . . . . . . . . . . . . . . . . . . 164
12.1. Use of UTF-8 . . . . . . . . . . . . . . . . . . . . . . 177 12. Internationalization . . . . . . . . . . . . . . . . . . . . 167
12.1.1. Relation to Stringprep . . . . . . . . . . . . . . . 177 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 167
12.1.2. Normalization, Equivalence, and Confusability . . . 178 12.2. String Encoding . . . . . . . . . . . . . . . . . . . . 167
12.2. String Type Overview . . . . . . . . . . . . . . . . . . 181 12.3. Normalization . . . . . . . . . . . . . . . . . . . . . 168
12.2.1. Overall String Class Divisions . . . . . . . . . . . 181 12.4. Types with Processing Defined by Other Internet Areas . 168
12.2.2. Divisions by Typedef Parent types . . . . . . . . . 182 12.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 169
12.2.3. Individual Types and Their Handling . . . . . . . . 183 13. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 170
12.3. Errors Related to Strings . . . . . . . . . . . . . . . 184 13.1. Error Definitions . . . . . . . . . . . . . . . . . . . 170
12.4. Types with Pre-processing to Resolve Mixture Issues . . 185 13.1.1. General Errors . . . . . . . . . . . . . . . . . . . 172
12.4.1. Processing of Principal Strings . . . . . . . . . . 185 13.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 173
12.4.2. Processing of Server Id Strings . . . . . . . . . . 186 13.1.3. Compound Structure Errors . . . . . . . . . . . . . 174
12.5. String Types without Internationalization Processing . . 186 13.1.4. File System Errors . . . . . . . . . . . . . . . . . 175
12.6. Types with Processing Defined by Other Internet Areas . 187 13.1.5. State Management Errors . . . . . . . . . . . . . . 177
12.7. String Types with NFS-specific Processing . . . . . . . 188 13.1.6. Security Errors . . . . . . . . . . . . . . . . . . 178
12.7.1. Handling of File Name Components . . . . . . . . . . 188 13.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 179
12.7.2. Processing of Link Text . . . . . . . . . . . . . . 197 13.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 179
12.7.3. Processing of Principal Prefixes . . . . . . . . . . 198 13.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 181
13. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 199 13.1.10. Client Management Errors . . . . . . . . . . . . . . 181
13.1. Error Definitions . . . . . . . . . . . . . . . . . . . 199 13.1.11. Attribute Handling Errors . . . . . . . . . . . . . 182
13.1.1. General Errors . . . . . . . . . . . . . . . . . . . 201 13.2. Operations and their valid errors . . . . . . . . . . . 182
13.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 202 13.3. Callback operations and their valid errors . . . . . . . 189
13.1.3. Compound Structure Errors . . . . . . . . . . . . . 204 13.4. Errors and the operations that use them . . . . . . . . 190
13.1.4. File System Errors . . . . . . . . . . . . . . . . . 204 14. NFSv4 Requests . . . . . . . . . . . . . . . . . . . . . . . 194
13.1.5. State Management Errors . . . . . . . . . . . . . . 206 14.1. Compound Procedure . . . . . . . . . . . . . . . . . . . 195
13.1.6. Security Errors . . . . . . . . . . . . . . . . . . 207 14.2. Evaluation of a Compound Request . . . . . . . . . . . . 195
13.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 208 14.3. Synchronous Modifying Operations . . . . . . . . . . . . 196
13.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 209 14.4. Operation Values . . . . . . . . . . . . . . . . . . . . 196
13.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 210 15. NFSv4 Procedures . . . . . . . . . . . . . . . . . . . . . . 197
13.1.10. Client Management Errors . . . . . . . . . . . . . . 211 15.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 197
13.1.11. Attribute Handling Errors . . . . . . . . . . . . . 211 15.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 197
13.2. Operations and their valid errors . . . . . . . . . . . 212 15.3. Operation 3: ACCESS - Check Access Rights . . . . . . . 201
13.3. Callback operations and their valid errors . . . . . . . 219 15.4. Operation 4: CLOSE - Close File . . . . . . . . . . . . 204
13.4. Errors and the operations that use them . . . . . . . . 219 15.5. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 205
14. NFSv4 Requests . . . . . . . . . . . . . . . . . . . . . . . 224 15.6. Operation 6: CREATE - Create a Non-Regular File Object . 207
14.1. Compound Procedure . . . . . . . . . . . . . . . . . . . 224
14.2. Evaluation of a Compound Request . . . . . . . . . . . . 225
14.3. Synchronous Modifying Operations . . . . . . . . . . . . 226
14.4. Operation Values . . . . . . . . . . . . . . . . . . . . 226
15. NFSv4 Procedures . . . . . . . . . . . . . . . . . . . . . . 226
15.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 226
15.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 227
15.3. Operation 3: ACCESS - Check Access Rights . . . . . . . 230
15.4. Operation 4: CLOSE - Close File . . . . . . . . . . . . 233
15.5. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 234
15.6. Operation 6: CREATE - Create a Non-Regular File Object . 237
15.7. Operation 7: DELEGPURGE - Purge Delegations Awaiting 15.7. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 239 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 210
15.8. Operation 8: DELEGRETURN - Return Delegation . . . . . . 241 15.8. Operation 8: DELEGRETURN - Return Delegation . . . . . . 211
15.9. Operation 9: GETATTR - Get Attributes . . . . . . . . . 241 15.9. Operation 9: GETATTR - Get Attributes . . . . . . . . . 212
15.10. Operation 10: GETFH - Get Current Filehandle . . . . . . 243 15.10. Operation 10: GETFH - Get Current Filehandle . . . . . . 214
15.11. Operation 11: LINK - Create Link to a File . . . . . . . 244 15.11. Operation 11: LINK - Create Link to a File . . . . . . . 215
15.12. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 246 15.12. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 216
15.13. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 250 15.13. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 220
15.14. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 252 15.14. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 222
15.15. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 253 15.15. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 223
15.16. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 255 15.16. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 225
15.17. Operation 17: NVERIFY - Verify Difference in 15.17. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 256 Attributes . . . . . . . . . . . . . . . . . . . . . . . 226
15.18. Operation 18: OPEN - Open a Regular File . . . . . . . . 257 15.18. Operation 18: OPEN - Open a Regular File . . . . . . . . 227
15.19. Operation 19: OPENATTR - Open Named Attribute 15.19. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 267 Directory . . . . . . . . . . . . . . . . . . . . . . . 237
15.20. Operation 20: OPEN_CONFIRM - Confirm Open . . . . . . . 268 15.20. Operation 20: OPEN_CONFIRM - Confirm Open . . . . . . . 238
15.21. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 270 15.21. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 240
15.22. Operation 22: PUTFH - Set Current Filehandle . . . . . . 271 15.22. Operation 22: PUTFH - Set Current Filehandle . . . . . . 241
15.23. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 272 15.23. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 242
15.24. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 273 15.24. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 243
15.25. Operation 25: READ - Read from File . . . . . . . . . . 274 15.25. Operation 25: READ - Read from File . . . . . . . . . . 244
15.26. Operation 26: READDIR - Read Directory . . . . . . . . . 276 15.26. Operation 26: READDIR - Read Directory . . . . . . . . . 246
15.27. Operation 27: READLINK - Read Symbolic Link . . . . . . 280 15.27. Operation 27: READLINK - Read Symbolic Link . . . . . . 250
15.28. Operation 28: REMOVE - Remove Filesystem Object . . . . 281 15.28. Operation 28: REMOVE - Remove Filesystem Object . . . . 251
15.29. Operation 29: RENAME - Rename Directory Entry . . . . . 283 15.29. Operation 29: RENAME - Rename Directory Entry . . . . . 253
15.30. Operation 30: RENEW - Renew a Lease . . . . . . . . . . 285 15.30. Operation 30: RENEW - Renew a Lease . . . . . . . . . . 255
15.31. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 286 15.31. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 256
15.32. Operation 32: SAVEFH - Save Current Filehandle . . . . . 287 15.32. Operation 32: SAVEFH - Save Current Filehandle . . . . . 257
15.33. Operation 33: SECINFO - Obtain Available Security . . . 288 15.33. Operation 33: SECINFO - Obtain Available Security . . . 258
15.34. Operation 34: SETATTR - Set Attributes . . . . . . . . . 292 15.34. Operation 34: SETATTR - Set Attributes . . . . . . . . . 262
15.35. Operation 35: SETCLIENTID - Negotiate Client ID . . . . 294 15.35. Operation 35: SETCLIENTID - Negotiate Client ID . . . . 264
15.36. Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID . 298 15.36. Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID . 268
15.37. Operation 37: VERIFY - Verify Same Attributes . . . . . 301 15.37. Operation 37: VERIFY - Verify Same Attributes . . . . . 271
15.38. Operation 38: WRITE - Write to File . . . . . . . . . . 303 15.38. Operation 38: WRITE - Write to File . . . . . . . . . . 273
15.39. Operation 39: RELEASE_LOCKOWNER - Release Lockowner 15.39. Operation 39: RELEASE_LOCKOWNER - Release Lockowner
State . . . . . . . . . . . . . . . . . . . . . . . . . 307 State . . . . . . . . . . . . . . . . . . . . . . . . . 277
15.40. Operation 10044: ILLEGAL - Illegal operation . . . . . . 308 15.40. Operation 10044: ILLEGAL - Illegal operation . . . . . . 278
16. NFSv4 Callback Procedures . . . . . . . . . . . . . . . . . . 309 16. NFSv4 Callback Procedures . . . . . . . . . . . . . . . . . . 279
16.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 309 16.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 279
16.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 309 16.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 279
16.2.6. Operation 3: CB_GETATTR - Get Attributes . . . . . . 311 16.2.6. Operation 3: CB_GETATTR - Get Attributes . . . . . . 281
16.2.7. Operation 4: CB_RECALL - Recall an Open Delegation . 312 16.2.7. Operation 4: CB_RECALL - Recall an Open Delegation . 282
16.2.8. Operation 10044: CB_ILLEGAL - Illegal Callback 16.2.8. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . 313 Operation . . . . . . . . . . . . . . . . . . . . . 283
17. Security Considerations . . . . . . . . . . . . . . . . . . . 314 17. Security Considerations . . . . . . . . . . . . . . . . . . . 284
18. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 316 18. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 286
18.1. Named Attribute Definitions . . . . . . . . . . . . . . 316 18.1. Named Attribute Definitions . . . . . . . . . . . . . . 286
18.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 317 18.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 287
18.1.2. Updating Registrations . . . . . . . . . . . . . . . 317 18.1.2. Updating Registrations . . . . . . . . . . . . . . . 287
19. References . . . . . . . . . . . . . . . . . . . . . . . . . 317 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 287
19.1. Normative References . . . . . . . . . . . . . . . . . . 317 19.1. Normative References . . . . . . . . . . . . . . . . . . 287
19.2. Informative References . . . . . . . . . . . . . . . . . 318 19.2. Informative References . . . . . . . . . . . . . . . . . 288
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 321 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 291
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 322 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 292
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 322 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 292
1. Introduction 1. Introduction
1.1. NFS Version 4 Goals 1.1. NFS Version 4 Goals
The Network Filesystem version 4 (NFSv4) protocol is a further The Network Filesystem version 4 (NFSv4) protocol is a further
revision of the NFS protocol defined already by versions 2 [RFC1094] revision of the NFS protocol defined already by versions 2 [RFC1094]
and 3 [RFC1813]. It retains the essential characteristics of and 3 [RFC1813]. It retains the essential characteristics of
previous versions: design for easy recovery, independent of transport previous versions: design for easy recovery, independent of transport
protocols, operating systems and filesystems, simplicity, and good protocols, operating systems and file systems, simplicity, and good
performance. The NFSv4 revision has the following goals: performance. The NFSv4 revision has the following goals:
o Improved access and good performance on the Internet. o Improved access and good performance on the Internet.
The protocol is designed to transit firewalls easily, perform well The protocol is designed to transit firewalls easily, perform well
where latency is high and bandwidth is low, and scale to very where latency is high and bandwidth is low, and scale to very
large numbers of clients per server. large numbers of clients per server.
o Strong security with negotiation built into the protocol. o Strong security with negotiation built into the protocol.
The protocol builds on the work of the ONCRPC working group in The protocol builds on the work of the Open Network Computing
supporting the RPCSEC_GSS protocol. Additionally, the NFS version (ONC) Remote Procedure Call (RPC) working group in supporting the
4 protocol provides a mechanism to allow clients and servers the RPCSEC_GSS protocol (see both [RFC2203] and [RFC5403]).
ability to negotiate security and require clients and servers to Additionally, the NFS version 4 protocol provides a mechanism to
support a minimal set of security schemes. allow clients and servers the ability to negotiate security and
require clients and servers to support a minimal set of security
schemes.
o Good cross-platform interoperability. o Good cross-platform interoperability.
The protocol features a filesystem model that provides a useful, The protocol features a file system model that provides a useful,
common set of features that does not unduly favor one filesystem common set of features that does not unduly favor one file system
or operating system over another. or operating system over another.
o Designed for protocol extensions. o Designed for protocol extensions.
The protocol is designed to accept standard extensions that do not The protocol is designed to accept standard extensions that do not
compromise backward compatibility. compromise backward compatibility.
This document, together with the companion XDR description document This document, together with the companion XDR description document
[I-D.ietf-nfsv4-rfc3530bis-dot-x], obsoletes RFC 3530 [RFC3530] as [I-D.ietf-nfsv4-rfc3530bis-dot-x], obsoletes RFC 3530 [RFC3530] as
the authoritative document describing NFSv4. It does not introduce the authoritative document describing NFSv4. It does not introduce
any over-the-wire protocol changes, in the sense that previously any over-the-wire protocol changes, in the sense that previously
valid requests requests remain valid. However, some requests valid requests remain valid.
previously defined as invalid, although not generally rejected, are
now explicitly allowed, in that internationalization handling has
been generalized and liberalized.
1.2. Inconsistencies of this Document with the companion document NFS 1.2. Definitions in the companion document NFS Version 4 Protocol are
Version 4 Protocol Authoritative
[I-D.ietf-nfsv4-rfc3530bis-dot-x], NFS Version 4 Protocol, contains [I-D.ietf-nfsv4-rfc3530bis-dot-x], NFS Version 4 Protocol, contains
the definitions in XDR description language of the constructs used by the definitions in XDR description language of the constructs used by
the protocol. Inside this document, several of the constructs are the protocol. Inside this document, several of the constructs are
reproduced for purposes of explanation. The reader is warned of the reproduced for purposes of explanation. The reader is warned of the
possibility of errors in the reproduced constructs outside of possibility of errors in the reproduced constructs outside of
[I-D.ietf-nfsv4-rfc3530bis-dot-x]. For any part of the document that [I-D.ietf-nfsv4-rfc3530bis-dot-x]. For any part of the document that
is inconsistent with [I-D.ietf-nfsv4-rfc3530bis-dot-x], is inconsistent with [I-D.ietf-nfsv4-rfc3530bis-dot-x],
[I-D.ietf-nfsv4-rfc3530bis-dot-x] is to be considered authoritative. [I-D.ietf-nfsv4-rfc3530bis-dot-x] is to be considered authoritative.
1.3. Overview of NFSv4 Features 1.3. Overview of NFSv4 Features
To provide a reasonable context for the reader, the major features of To provide a reasonable context for the reader, the major features of
NFSv4 protocol will be reviewed in brief. This will be done to NFSv4 protocol will be reviewed in brief. This will be done to
provide an appropriate context for both the reader who is familiar provide an appropriate context for both the reader who is familiar
with the previous versions of the NFS protocol and the reader that is with the previous versions of the NFS protocol and the reader who is
new to the NFS protocols. For the reader new to the NFS protocols, new to the NFS protocols. For the reader new to the NFS protocols,
some fundamental knowledge is still expected. The reader should be some fundamental knowledge is still expected. The reader should be
familiar with the XDR and RPC protocols as described in [RFC5531] and familiar with the XDR and RPC protocols as described in [RFC5531] and
[RFC4506]. A basic knowledge of filesystems and distributed [RFC4506]. A basic knowledge of file systems and distributed file
filesystems is expected as well. systems is expected as well.
1.3.1. RPC and Security 1.3.1. RPC and Security
As with previous versions of NFS, the External Data Representation As with previous versions of NFS, the External Data Representation
(XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4 (XDR) and RPC mechanisms used for the NFSv4 protocol are those
protocol are those defined in [RFC5531] and [RFC4506]. To meet end defined in [RFC5531] and [RFC4506]. To meet end to end security
to end security requirements, the RPCSEC_GSS framework [RFC2203] will requirements, the RPCSEC_GSS framework (both version 1 in [RFC2203]
be used to extend the basic RPC security. With the use of and version 2 in [RFC5403]) will be used to extend the basic RPC
RPCSEC_GSS, various mechanisms can be provided to offer security. With the use of RPCSEC_GSS, various mechanisms can be
authentication, integrity, and privacy to the NFS version 4 protocol. provided to offer authentication, integrity, and privacy to the NFS
Kerberos V5 will be used as described in [RFC4121] to provide one version 4 protocol. Kerberos V5 will be used as described in
security framework. With the use of RPCSEC_GSS, other mechanisms may [RFC4121] to provide one security framework. With the use of
also be specified and used for NFS version 4 security. RPCSEC_GSS, other mechanisms may also be specified and used for NFS
version 4 security.
To enable in-band security negotiation, the NFSv4 protocol has added To enable in-band security negotiation, the NFSv4 protocol has added
a new operation which provides the client a method of querying the a new operation which provides the client with a method of querying
server about its policies regarding which security mechanisms must be the server about its policies regarding which security mechanisms
used for access to the server's filesystem resources. With this, the must be used for access to the server's file system resources. With
client can securely match the security mechanism that meets the this, the client can securely match the security mechanism that meets
policies specified at both the client and server. the policies specified at both the client and server.
1.3.2. Procedure and Operation Structure 1.3.2. Procedure and Operation Structure
A significant departure from the previous versions of the NFS A significant departure from the previous versions of the NFS
protocol is the introduction of the COMPOUND procedure. For the protocol is the introduction of the COMPOUND procedure. For the
NFSv4 protocol, there are two RPC procedures, NULL and COMPOUND. The NFSv4 protocol, there are two RPC procedures, NULL and COMPOUND. The
COMPOUND procedure is defined in terms of operations and these COMPOUND procedure is defined in terms of operations and these
operations correspond more closely to the traditional NFS procedures. operations correspond more closely to the traditional NFS procedures.
With the use of the COMPOUND procedure, the client is able to build With the use of the COMPOUND procedure, the client is able to build
simple or complex requests. These COMPOUND requests allow for a simple or complex requests. These COMPOUND requests allow for a
reduction in the number of RPCs needed for logical filesystem reduction in the number of RPCs needed for logical file system
operations. For example, without previous contact with a server a operations. For example, without previous contact with a server a
client will be able to read data from a file in one request by client will be able to read data from a file in one request by
combining LOOKUP, OPEN, and READ operations in a single COMPOUND RPC. combining LOOKUP, OPEN, and READ operations in a single COMPOUND RPC.
With previous versions of the NFS protocol, this type of single With previous versions of the NFS protocol, this type of single
request was not possible. request was not possible.
The model used for COMPOUND is very simple. There is no logical OR The model used for COMPOUND is very simple. There is no logical OR
or ANDing of operations. The operations combined within a COMPOUND or ANDing of operations. The operations combined within a COMPOUND
request are evaluated in order by the server. Once an operation request are evaluated in order by the server. Once an operation
returns a failing result, the evaluation ends and the results of all returns a failing result, the evaluation ends and the results of all
evaluated operations are returned to the client. evaluated operations are returned to the client.
The NFSv4 protocol continues to have the client refer to a file or The NFSv4 protocol continues to have the client refer to a file or
directory at the server by a "filehandle". The COMPOUND procedure directory at the server by a "filehandle". The COMPOUND procedure
has a method of passing a filehandle from one operation to another has a method of passing a filehandle from one operation to another
within the sequence of operations. There is a concept of a "current within the sequence of operations. There is a concept of a "current
filehandle" and "saved filehandle". Most operations use the "current filehandle" and "saved filehandle". Most operations use the "current
filehandle" as the filesystem object to operate upon. The "saved filehandle" as the file system object to operate upon. The "saved
filehandle" is used as temporary filehandle storage within a COMPOUND filehandle" is used as temporary filehandle storage within a COMPOUND
procedure as well as an additional operand for certain operations. procedure as well as an additional operand for certain operations.
1.3.3. Filesystem Model 1.3.3. Filesystem Model
The general filesystem model used for the NFSv4 protocol is the same The general file system model used for the NFSv4 protocol is the same
as previous versions. The server filesystem is hierarchical with the as previous versions. The server file system is hierarchical with
regular files contained within being treated as opaque byte streams. the regular files contained within being treated as opaque byte
In a slight departure, file and directory names are encoded with streams. In a slight departure, file and directory names are encoded
UTF-8 to deal with the basics of internationalization. with UTF-8 to deal with the basics of internationalization.
The NFSv4 protocol does not require a separate protocol to provide The NFSv4 protocol does not require a separate protocol to provide
for the initial mapping between path name and filehandle. Instead of for the initial mapping between path name and filehandle. Instead of
using the older MOUNT protocol for this mapping, the server provides using the older MOUNT protocol for this mapping, the server provides
a ROOT filehandle that represents the logical root or top of the a ROOT filehandle that represents the logical root or top of the file
filesystem tree provided by the server. The server provides multiple system tree provided by the server. The server provides multiple
filesystems by gluing them together with pseudo filesystems. These file systems by gluing them together with pseudo file systems. These
pseudo filesystems provide for potential gaps in the path names pseudo file systems provide for potential gaps in the path names
between real filesystems. between real file systems.
1.3.3.1. Filehandle Types 1.3.3.1. Filehandle Types
In previous versions of the NFS protocol, the filehandle provided by In previous versions of the NFS protocol, the filehandle provided by
the server was guaranteed to be valid or persistent for the lifetime the server was guaranteed to be valid or persistent for the lifetime
of the filesystem object to which it referred. For some server of the file system object to which it referred. For some server
implementations, this persistence requirement has been difficult to implementations, this persistence requirement has been difficult to
meet. For the NFSv4 protocol, this requirement has been relaxed by meet. For the NFSv4 protocol, this requirement has been relaxed by
introducing another type of filehandle, volatile. With persistent introducing another type of filehandle, volatile. With persistent
and volatile filehandle types, the server implementation can match and volatile filehandle types, the server implementation can match
the abilities of the filesystem at the server along with the the abilities of the file system at the server along with the
operating environment. The client will have knowledge of the type of operating environment. The client will have knowledge of the type of
filehandle being provided by the server and can be prepared to deal filehandle being provided by the server and can be prepared to deal
with the semantics of each. with the semantics of each.
1.3.3.2. Attribute Types 1.3.3.2. Attribute Types
The NFSv4 protocol has a rich and extensible file object attribute The NFSv4 protocol has a rich and extensible file object attribute
structure, which is divided into REQUIRED, RECOMMENDED, and named structure, which is divided into REQUIRED, RECOMMENDED, and named
attributes (see Section 5). attributes (see Section 5).
Several (but not all) of the REQUIRED attributes are derived from the Several (but not all) of the REQUIRED attributes are derived from the
attributes of NFSv3 (see definition of the fattr3 data type in attributes of NFSv3 (see definition of the fattr3 data type in
[RFC1813]). An example of a REQUIRED attribute is the file object's [RFC1813]). An example of a REQUIRED attribute is the file object's
type (Section 5.8.1.2) so that regular files can be distinguished type (Section 5.8.1.2) so that regular files can be distinguished
from directories (also known as folders in some operating from directories (also known as folders in some operating
environments) and other types of objects. REQUIRED attributes are environments) and other types of objects. REQUIRED attributes are
discussed in Section 5.1. discussed in Section 5.1.
An example of the RECOMMENDED attributes is an acl. This attribute An example of the RECOMMENDED attributes is an acl (Section 6.2.1).
defines an Access Control List (ACL) on a file object ((Section 6). This attribute defines an Access Control List (ACL) on a file object.
An ACL provides file access control beyond the model used in NFSv3. An ACL provides file access control beyond the model used in NFSv3.
The ACL definition allows for specification of specific sets of The ACL definition allows for specification of specific sets of
permissions for individual users and groups. In addition, ACL permissions for individual users and groups. In addition, ACL
inheritance allows propagation of access permissions and restriction inheritance allows propagation of access permissions and restriction
down a directory tree as file system objects are created. down a directory tree as file system objects are created.
RECOMMENDED attributes are discussed in Section 5.2. RECOMMENDED attributes are discussed in Section 5.2.
A named attribute is an opaque byte stream that is associated with a A named attribute is an opaque byte stream that is associated with a
directory or file and referred to by a string name. Named attributes directory or file and referred to by a string name. Named attributes
are meant to be used by client applications as a method to associate are meant to be used by client applications as a method to associate
skipping to change at page 14, line 11 skipping to change at page 12, line 51
state associated with the client's lease may be released by the state associated with the client's lease may be released by the
server. The client may renew its lease with use of the RENEW server. The client may renew its lease with use of the RENEW
operation or implicitly by use of other operations (primarily READ). operation or implicitly by use of other operations (primarily READ).
1.3.6. Client Caching and Delegation 1.3.6. Client Caching and Delegation
The file, attribute, and directory caching for the NFSv4 protocol is The file, attribute, and directory caching for the NFSv4 protocol is
similar to previous versions. Attributes and directory information similar to previous versions. Attributes and directory information
are cached for a duration determined by the client. At the end of a are cached for a duration determined by the client. At the end of a
predefined timeout, the client will query the server to see if the predefined timeout, the client will query the server to see if the
related filesystem object has been updated. related file system object has been updated.
For file data, the client checks its cache validity when the file is For file data, the client checks its cache validity when the file is
opened. A query is sent to the server to determine if the file has opened. A query is sent to the server to determine if the file has
been changed. Based on this information, the client determines if been changed. Based on this information, the client determines if
the data cache for the file should kept or released. Also, when the the data cache for the file should kept or released. Also, when the
file is closed, any modified data is written to the server. file is closed, any modified data is written to the server.
If an application wants to serialize access to file data, file If an application wants to serialize access to file data, file
locking of the file data ranges in question should be used. locking of the file data ranges in question should be used.
skipping to change at page 14, line 49 skipping to change at page 13, line 41
exist, then delegations cannot be granted. The essence of a exist, then delegations cannot be granted. The essence of a
delegation is that it allows the client to locally service operations delegation is that it allows the client to locally service operations
such as OPEN, CLOSE, LOCK, LOCKU, READ, or WRITE without immediate such as OPEN, CLOSE, LOCK, LOCKU, READ, or WRITE without immediate
interaction with the server. interaction with the server.
1.4. General Definitions 1.4. General Definitions
The following definitions are provided for the purpose of providing The following definitions are provided for the purpose of providing
an appropriate context for the reader. an appropriate context for the reader.
Absent File System: A file system is "absent" when a namespace
component does not have a backing file system.
Byte: In this document, a byte is an octet, i.e., a datum exactly 8 Byte: In this document, a byte is an octet, i.e., a datum exactly 8
bits in length. bits in length.
Client: The client is the entity that accesses the NFS server's Client: The client is the entity that accesses the NFS server's
resources. The client may be an application that contains the resources. The client may be an application that contains the
logic to access the NFS server directly. The client may also be logic to access the NFS server directly. The client may also be
the traditional operating system client that provides remote the traditional operating system client that provides remote file
filesystem services for a set of applications. system services for a set of applications.
With reference to byte-range locking, the client is also the With reference to byte-range locking, the client is also the
entity that maintains a set of locks on behalf of one or more entity that maintains a set of locks on behalf of one or more
applications. This client is responsible for crash or failure applications. This client is responsible for crash or failure
recovery for those locks it manages. recovery for those locks it manages.
Note that multiple clients may share the same transport and Note that multiple clients may share the same transport and
connection and multiple clients may exist on the same network connection and multiple clients may exist on the same network
node. node.
skipping to change at page 15, line 43 skipping to change at page 14, line 37
All leases granted by a server have the same fixed interval. Note All leases granted by a server have the same fixed interval. Note
that the fixed interval was chosen to alleviate the expense a that the fixed interval was chosen to alleviate the expense a
server would have in maintaining state about variable length server would have in maintaining state about variable length
leases across server failures. leases across server failures.
Lock: The term "lock" is used to refer to both record (byte-range) Lock: The term "lock" is used to refer to both record (byte-range)
locks as well as share reservations unless specifically stated locks as well as share reservations unless specifically stated
otherwise. otherwise.
Server: The "Server" is the entity responsible for coordinating Server: The "Server" is the entity responsible for coordinating
client access to a set of filesystems. client access to a set of file systems.
Stable Storage: NFSv4 servers must be able to recover without data Stable Storage: NFSv4 servers must be able to recover without data
loss from multiple power failures (including cascading power loss from multiple power failures (including cascading power
failures, that is, several power failures in quick succession), failures, that is, several power failures in quick succession),
operating system failures, and hardware failure of components operating system failures, and hardware failure of components
other than the storage medium itself (for example, disk, other than the storage medium itself (for example, disk,
nonvolatile RAM). nonvolatile RAM).
Some examples of stable storage that are allowable for an NFS Some examples of stable storage that are allowable for an NFS
server include: server include:
skipping to change at page 16, line 38 skipping to change at page 15, line 33
1.5. Changes since RFC 3530 1.5. Changes since RFC 3530
The main changes from RFC 3530 [RFC3530] are: The main changes from RFC 3530 [RFC3530] are:
o The XDR definition has been moved to a companion document o The XDR definition has been moved to a companion document
[I-D.ietf-nfsv4-rfc3530bis-dot-x] [I-D.ietf-nfsv4-rfc3530bis-dot-x]
o Updates for the latest IETF intellectual property statements o Updates for the latest IETF intellectual property statements
o There is a restructured and more complete explanation of multi- o There is a restructured and more complete explanation of multi-
server namespace features. In particular, this explanation server namespace features.
explicitly describes handling of inter-server referrals, even
where neither migration nor replication is involved.
o More liberal handling of internationalization for file names and
user and group names, with the elimination of restrictions imposed
by stringprep, with the recognition that rules for the forms of
these name are the province of the receiving entity.
o Updating handling of domain names to reflect IDNA [RFC5891].
o Restructuring of string types to more appropriately reflect the o Updating handling of domain names to reflect Internationalized
reality of required string processing. Domain Names in Applications (IDNA) [RFC5891].
o The previously required LIPKEY and SPKM-3 security mechanisms have o The previously required LIPKEY and SPKM-3 security mechanisms have
been removed. been removed.
o Some clarification on a client re-establishing callback o Some clarification on a client re-establishing callback
information to the new server if state has been migrated. information to the new server if state has been migrated.
o A third edge case was added for Courtesy locks and network o A third edge case was added for Courtesy locks and network
partitions. partitions.
skipping to change at page 18, line 22 skipping to change at page 17, line 11
o RENEW operation changes to identify the client correctly and allow o RENEW operation changes to identify the client correctly and allow
for additional error returns. for additional error returns.
o Verify error return possibilities for all operations. o Verify error return possibilities for all operations.
o Remove use of the pathname4 data type from LOOKUP and OPEN in o Remove use of the pathname4 data type from LOOKUP and OPEN in
favor of having the client construct a sequence of LOOKUP favor of having the client construct a sequence of LOOKUP
operations to achieve the same effect. operations to achieve the same effect.
o Clarification of the internationalization issues and adoption of
the new stringprep profile framework.
2. Protocol Data Types 2. Protocol Data Types
The syntax and semantics to describe the data types of the NFS The syntax and semantics to describe the data types of the NFS
version 4 protocol are defined in the XDR [RFC4506] and RPC [RFC5531] version 4 protocol are defined in the XDR [RFC4506] and RPC [RFC5531]
documents. The next sections build upon the XDR data types to define documents. The next sections build upon the XDR data types to define
types and structures specific to this protocol. types and structures specific to this protocol.
2.1. Basic Data Types 2.1. Basic Data Types
These are the base NFSv4 data types. These are the base NFSv4 data types.
+----------------------+--------------------------------------------+ +-----------------+-------------------------------------------------+
| Data Type | Definition | | Data Type | Definition |
+----------------------+--------------------------------------------+ +-----------------+-------------------------------------------------+
| int32_t | typedef int int32_t; | | int32_t | typedef int int32_t; |
| uint32_t | typedef unsigned int uint32_t; | | uint32_t | typedef unsigned int uint32_t; |
| int64_t | typedef hyper int64_t; | | int64_t | typedef hyper int64_t; |
| uint64_t | typedef unsigned hyper uint64_t; | | uint64_t | typedef unsigned hyper uint64_t; |
| attrlist4 | typedef opaque attrlist4<>; | | attrlist4 | typedef opaque attrlist4<>; |
| | Used for file/directory attributes. | | | Used for file/directory attributes. |
| bitmap4 | typedef uint32_t bitmap4<>; | | bitmap4 | typedef uint32_t bitmap4<>; |
| | Used in attribute array encoding. | | | Used in attribute array encoding. |
| changeid4 | typedef uint64_t changeid4; | | changeid4 | typedef uint64_t changeid4; |
| | Used in the definition of change_info4. | | | Used in the definition of change_info4. |
| clientid4 | typedef uint64_t clientid4; | | clientid4 | typedef uint64_t clientid4; |
| | Shorthand reference to client | | | Shorthand reference to client identification. |
| | identification. | | count4 | typedef uint32_t count4; |
| count4 | typedef uint32_t count4; | | | Various count parameters (READ, WRITE, COMMIT). |
| | Various count parameters (READ, WRITE, | | length4 | typedef uint64_t length4; |
| | COMMIT). | | | Describes LOCK lengths. |
| length4 | typedef uint64_t length4; | | mode4 | typedef uint32_t mode4; |
| | Describes LOCK lengths. | | | Mode attribute data type. |
| mode4 | typedef uint32_t mode4; | | nfs_cookie4 | typedef uint64_t nfs_cookie4; |
| | Mode attribute data type. | | | Opaque cookie value for READDIR. |
| nfs_cookie4 | typedef uint64_t nfs_cookie4; | | nfs_fh4 | typedef opaque nfs_fh4<NFS4_FHSIZE>; |
| | Opaque cookie value for READDIR. | | | Filehandle definition. |
| nfs_fh4 | typedef opaque nfs_fh4<NFS4_FHSIZE>; | | nfs_ftype4 | enum nfs_ftype4; |
| | Filehandle definition. | | | Various defined file types. |
| nfs_ftype4 | enum nfs_ftype4; | | nfsstat4 | enum nfsstat4; |
| | Various defined file types. | | | Return value for operations. |
| nfsstat4 | enum nfsstat4; | | offset4 | typedef uint64_t offset4; |
| | Return value for operations. | | | Various offset designations (READ, WRITE, LOCK, |
| offset4 | typedef uint64_t offset4; | | | COMMIT). |
| | Various offset designations (READ, WRITE, | | qop4 | typedef uint32_t qop4; |
| | LOCK, COMMIT). | | | Quality of protection designation in SECINFO. |
| qop4 | typedef uint32_t qop4; | | sec_oid4 | typedef opaque sec_oid4<>; |
| | Quality of protection designation in | | | Security Object Identifier. The sec_oid4 data |
| | SECINFO. | | | type is not really opaque. Instead it contains |
| sec_oid4 | typedef opaque sec_oid4<>; | | | an ASN.1 OBJECT IDENTIFIER as used by GSS-API |
| | Security Object Identifier. The sec_oid4 | | | in the mech_type argument to |
| | data type is not really opaque. Instead | | | GSS_Init_sec_context. See [RFC2743] for |
| | it contains an ASN.1 OBJECT IDENTIFIER as | | | details. |
| | used by GSS-API in the mech_type argument | | seqid4 | typedef uint32_t seqid4; |
| | to GSS_Init_sec_context. See [RFC2743] | | | Sequence identifier used for file locking. |
| | for details. | | utf8string | typedef opaque utf8string<>; |
| seqid4 | typedef uint32_t seqid4; | | | UTF-8 encoding for strings. |
| | Sequence identifier used for file locking. | | utf8str_cis | typedef utf8string utf8str_cis; |
| utf8string | typedef opaque utf8string<>; | | | Case-insensitive UTF-8 string. |
| | UTF-8 encoding for strings. | | utf8str_cs | typedef utf8string utf8str_cs; |
| utf8_expected | typedef utf8string utf8_expected; | | | Case-sensitive UTF-8 string. |
| | String expected to be UTF-8 but no | | utf8str_mixed | typedef utf8string utf8str_mixed; |
| | validation | | | UTF-8 strings with a case-sensitive prefix and |
| utf8val_RECOMMENDED4 | typedef utf8string utf8val_RECOMMENDED4; | | | a case-insensitive suffix. |
| | String SHOULD be sent UTF-8 and SHOULD be | | component4 | typedef utf8str_cs component4; |
| | validated | | | Represents pathname components. |
| utf8val_REQUIRED4 | typedef utf8string utf8val_REQUIRED4; | | linktext4 | typedef utf8str_cs linktext4; |
| | String MUST be sent UTF-8 and MUST be | | | Symbolic link contents ("symbolic link" is |
| | validated | | | defined in an Open Group [openg_symlink] |
| ascii_REQUIRED4 | typedef utf8string ascii_REQUIRED4; | | | standard). |
| | String MUST be sent as ASCII and thus is | | ascii_REQUIRED4 | typedef utf8string ascii_REQUIRED4; |
| | automatically UTF-8 | | | String MUST be sent as ASCII and thus is |
| comptag4 | typedef utf8_expected comptag4; | | | automatically UTF-8. |
| | Tag should be UTF-8 but is not checked | | pathname4 | typedef component4 pathname4<>; |
| component4 | typedef utf8val_RECOMMENDED4 component4; | | | Represents path name for fs_locations. |
| | Represents path name components. | | nfs_lockid4 | typedef uint64_t nfs_lockid4; |
| linktext4 | typedef utf8val_RECOMMENDED4 linktext4; | | verifier4 | typedef opaque verifier4[NFS4_VERIFIER_SIZE]; |
| | Symbolic link contents. | | | Verifier used for various operations (COMMIT, |
| pathname4 | typedef component4 pathname4<>; | | | CREATE, OPEN, READDIR, WRITE) |
| | Represents path name for fs_locations. | | | NFS4_VERIFIER_SIZE is defined as 8. |
| nfs_lockid4 | typedef uint64_t nfs_lockid4; | +-----------------+-------------------------------------------------+
| verifier4 | typedef opaque |
| | verifier4[NFS4_VERIFIER_SIZE]; |
| | Verifier used for various operations |
| | (COMMIT, CREATE, OPEN, READDIR, WRITE) |
| | NFS4_VERIFIER_SIZE is defined as 8. |
+----------------------+--------------------------------------------+
End of Base Data Types End of Base Data Types
Table 1 Table 1
2.2. Structured Data Types 2.2. Structured Data Types
2.2.1. nfstime4 2.2.1. nfstime4
struct nfstime4 { struct nfstime4 {
skipping to change at page 20, line 44 skipping to change at page 19, line 29
both cases, the nseconds field is to be added to the seconds field both cases, the nseconds field is to be added to the seconds field
for the final time representation. For example, if the time to be for the final time representation. For example, if the time to be
represented is one-half second before 0 hour January 1, 1970, the represented is one-half second before 0 hour January 1, 1970, the
seconds field would have a value of negative one (-1) and the seconds field would have a value of negative one (-1) and the
nseconds fields would have a value of one-half second (500000000). nseconds fields would have a value of one-half second (500000000).
Values greater than 999,999,999 for nseconds are considered invalid. Values greater than 999,999,999 for nseconds are considered invalid.
This data type is used to pass time and date information. A server This data type is used to pass time and date information. A server
converts to and from its local representation of time when processing converts to and from its local representation of time when processing
time values, preserving as much accuracy as possible. If the time values, preserving as much accuracy as possible. If the
precision of timestamps stored for a filesystem object is less than precision of timestamps stored for a file system object is less than
defined, loss of precision can occur. An adjunct time maintenance defined, loss of precision can occur. An adjunct time maintenance
protocol is recommended to reduce client and server time skew. protocol is recommended to reduce client and server time skew.
2.2.2. time_how4 2.2.2. time_how4
enum time_how4 { enum time_how4 {
SET_TO_SERVER_TIME4 = 0, SET_TO_SERVER_TIME4 = 0,
SET_TO_CLIENT_TIME4 = 1 SET_TO_CLIENT_TIME4 = 1
}; };
skipping to change at page 21, line 42 skipping to change at page 20, line 22
This data type represents additional information for the device file This data type represents additional information for the device file
types NF4CHR and NF4BLK. types NF4CHR and NF4BLK.
2.2.5. fsid4 2.2.5. fsid4
struct fsid4 { struct fsid4 {
uint64_t major; uint64_t major;
uint64_t minor; uint64_t minor;
}; };
This type is the filesystem identifier that is used as a mandatory This type is the file system identifier that is used as a mandatory
attribute. attribute.
2.2.6. fs_location4 2.2.6. fs_location4
struct fs_location4 { struct fs_location4 {
utf8val_REQUIRED4 server<>; utf8str_cis server<>;
pathname4 rootpath; pathname4 rootpath;
}; };
2.2.7. fs_locations4 2.2.7. fs_locations4
struct fs_locations4 { struct fs_locations4 {
pathname4 fs_root; pathname4 fs_root;
fs_location4 locations<>; fs_location4 locations<>;
}; };
skipping to change at page 22, line 46 skipping to change at page 21, line 25
2.2.9. change_info4 2.2.9. change_info4
struct change_info4 { struct change_info4 {
bool atomic; bool atomic;
changeid4 before; changeid4 before;
changeid4 after; changeid4 after;
}; };
This structure is used with the CREATE, LINK, REMOVE, RENAME This structure is used with the CREATE, LINK, REMOVE, RENAME
operations to let the client know the value of the change attribute operations to let the client know the value of the change attribute
for the directory in which the target filesystem object resides. for the directory in which the target file system object resides.
2.2.10. clientaddr4 2.2.10. clientaddr4
struct clientaddr4 { struct clientaddr4 {
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string r_netid<>; /* network id */ string r_netid<>; /* network id */
string r_addr<>; /* universal address */ string r_addr<>; /* universal address */
}; };
The clientaddr4 structure is used as part of the SETCLIENTID The clientaddr4 structure is used as part of the SETCLIENTID
skipping to change at page 23, line 39 skipping to change at page 22, line 13
back address; includes the program number and client address. back address; includes the program number and client address.
2.2.12. nfs_client_id4 2.2.12. nfs_client_id4
struct nfs_client_id4 { struct nfs_client_id4 {
verifier4 verifier; verifier4 verifier;
opaque id<NFS4_OPAQUE_LIMIT>; opaque id<NFS4_OPAQUE_LIMIT>;
}; };
This structure is part of the arguments to the SETCLIENTID operation. This structure is part of the arguments to the SETCLIENTID operation.
NFS4_OPAQUE_LIMIT is defined as 1024.
2.2.13. open_owner4 2.2.13. open_owner4
struct open_owner4 { struct open_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
This structure is used to identify the owner of open state. This structure is used to identify the owner of open state.
NFS4_OPAQUE_LIMIT is defined as 1024.
2.2.14. lock_owner4 2.2.14. lock_owner4
struct lock_owner4 { struct lock_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
This structure is used to identify the owner of file locking state. This structure is used to identify the owner of file locking state.
NFS4_OPAQUE_LIMIT is defined as 1024.
2.2.15. open_to_lock_owner4 2.2.15. open_to_lock_owner4
struct open_to_lock_owner4 { struct open_to_lock_owner4 {
seqid4 open_seqid; seqid4 open_seqid;
stateid4 open_stateid; stateid4 open_stateid;
seqid4 lock_seqid; seqid4 lock_seqid;
lock_owner4 lock_owner; lock_owner4 lock_owner;
}; };
skipping to change at page 24, line 35 skipping to change at page 23, line 9
open_owner4. It provides both the open_stateid and lock_owner such open_owner4. It provides both the open_stateid and lock_owner such
that the transition is made from a valid open_stateid sequence to that the transition is made from a valid open_stateid sequence to
that of the new lock_stateid sequence. Using this mechanism avoids that of the new lock_stateid sequence. Using this mechanism avoids
the confirmation of the lock_owner/lock_seqid pair since it is tied the confirmation of the lock_owner/lock_seqid pair since it is tied
to established state in the form of the open_stateid/open_seqid. to established state in the form of the open_stateid/open_seqid.
2.2.16. stateid4 2.2.16. stateid4
struct stateid4 { struct stateid4 {
uint32_t seqid; uint32_t seqid;
opaque other[12]; opaque other[NFS4_OTHER_SIZE];
}; };
This structure is used for the various state sharing mechanisms This structure is used for the various state sharing mechanisms
between the client and server. For the client, this data structure between the client and server. For the client, this data structure
is read-only. The server is required to increment the seqid field is read-only. The server is required to increment the seqid field
monotonically at each transition of the stateid. This is important monotonically at each transition of the stateid. This is important
since the client will inspect the seqid in OPEN stateids to determine since the client will inspect the seqid in OPEN stateids to determine
the order of OPEN processing done by the server. the order of OPEN processing done by the server.
3. RPC and Security Flavor 3. RPC and Security Flavor
The NFSv4 protocol is a Remote Procedure Call (RPC) application that The NFSv4 protocol is a RPC application that uses RPC version 2 and
uses RPC version 2 and the corresponding eXternal Data Representation the XDR as defined in [RFC5531] and [RFC4506]. The RPCSEC_GSS
(XDR) as defined in [RFC5531] and [RFC4506]. The RPCSEC_GSS security security flavors as defined in version 1 ([RFC2203]) and version 2
flavor as defined in [RFC2203] MUST be implemented as the mechanism ([RFC5403]) MUST be implemented as the mechanism to deliver stronger
to deliver stronger security for the NFSv4 protocol. However, security for the NFSv4 protocol. However, deployment of RPCSEC_GSS
deployment of RPCSEC_GSS is optional. is optional.
3.1. Ports and Transports 3.1. Ports and Transports
Historically, NFSv2 and NFSv3 servers have resided on port 2049. The Historically, NFSv2 and NFSv3 servers have resided on port 2049. The
registered port 2049 [RFC3232] for the NFS protocol SHOULD be the registered port 2049 [RFC3232] for the NFS protocol SHOULD be the
default configuration. Using the registered port for NFS services default configuration. Using the registered port for NFS services
means the NFS client will not need to use the RPC binding protocols means the NFS client will not need to use the RPC binding protocols
as described in [RFC1833]; this will allow NFS to transit firewalls. as described in [RFC1833]; this will allow NFS to transit firewalls.
Where an NFSv4 implementation supports operation over the IP network Where an NFSv4 implementation supports operation over the IP network
protocol, the supported transports between NFS and IP MUST be among protocol, the supported transport layer between NFS and IP MUST be an
the IETF-approved congestion control transport protocols, which IETF standardised transport protocol that is specified to avoid
include TCP and SCTP. To enhance the possibilities for network congestion; such transports include TCP and SCTP. To enhance
interoperability, an NFSv4 implementation MUST support operation over the possibilities for interoperability, an NFSv4 implementation MUST
the TCP transport protocol, at least until such time as a standards support operation over the TCP transport protocol, at least until
track RFC revises this requirement to use a different IETF-approved such time as a standards track RFC revises this requirement to use a
congestion control transport protocol. different IETF standardised transport protocol with appropriate
congestion control.
If TCP is used as the transport, the client and server SHOULD use If TCP is used as the transport, the client and server SHOULD use
persistent connections. This will prevent the weakening of TCP's persistent connections. This will prevent the weakening of TCP's
congestion control via short lived connections and will improve congestion control via short lived connections and will improve
performance for the WAN environment by eliminating the need for SYN performance for the Wide Area Network (WAN) environment by
handshakes. eliminating the need for SYN handshakes.
To date, all NFSv4 implementations are TCP based, i.e., there are To date, all NFSv4 implementations are TCP based, i.e., there are
none for SCTP nor UDP. UDP by itself is not sufficient as a none for SCTP nor UDP. UDP by itself is not sufficient as a
transport for NFSv4, neither is UDP in combination with some other transport for NFSv4, neither is UDP in combination with some other
mechanism (e.g., DCCP [RFC4340], NORM [RFC5740]). mechanism (e.g., DCCP [RFC4340], NORM [RFC5740]).
As noted in Section 17, the authentication model for NFSv4 has moved As noted in Section 17, the authentication model for NFSv4 has moved
from machine-based to principal-based. However, this modification of from machine-based to principal-based. However, this modification of
the authentication model does not imply a technical requirement to the authentication model does not imply a technical requirement to
move the TCP connection management model from whole machine-based to move the TCP connection management model from whole machine-based to
one based on a per user model. In particular, NFS over TCP client one based on a per user model. In particular, NFS over TCP client
implementations have traditionally multiplexed traffic for multiple implementations have traditionally multiplexed traffic for multiple
users over a common TCP connection between an NFS client and server. users over a common TCP connection between an NFS client and server.
This has been true, regardless whether the NFS client is using This has been true, regardless of whether the NFS client is using
AUTH_SYS, AUTH_DH, RPCSEC_GSS or any other flavor. Similarly, NFS AUTH_SYS, AUTH_DH, RPCSEC_GSS or any other flavor. Similarly, NFS
over TCP server implementations have assumed such a model and thus over TCP server implementations have assumed such a model and thus
scale the implementation of TCP connection management in proportion scale the implementation of TCP connection management in proportion
to the number of expected client machines. It is intended that NFSv4 to the number of expected client machines. It is intended that NFSv4
will not modify this connection management model. NFSv4 clients that will not modify this connection management model. NFSv4 clients that
violate this assumption can expect scaling issues on the server and violate this assumption can expect scaling issues on the server and
hence reduced service. hence reduced service.
Note that for various timers, the client and server should avoid
inadvertent synchronization of those timers. For further discussion
of the general issue refer to [Floyd].
3.1.1. Client Retransmission Behavior 3.1.1. Client Retransmission Behavior
When processing a request received over a reliable transport such as When processing a NFSv4 request received over a reliable transport
TCP, the NFSv4 server MUST NOT silently drop the request, except if such as TCP, the NFSv4 server MUST NOT silently drop the request,
the transport connection has been broken. Given such a contract except if the established transport connection has been broken.
between NFSv4 clients and servers, clients MUST NOT retry a request Given such a contract between NFSv4 clients and servers, clients MUST
unless one or both of the following are true: NOT retry a request unless one or both of the following are true:
o The transport connection has been broken o The transport connection has been broken
o The procedure being retried is the NULL procedure o The procedure being retried is the NULL procedure
Since reliable transports, such as TCP, do not always synchronously Since reliable transports, such as TCP, do not always synchronously
inform a peer when the other peer has broken the connection (for inform a peer when the other peer has broken the connection (for
example, when an NFS server reboots), the NFSv4 client may want to example, when an NFS server reboots), the NFSv4 client may want to
actively "probe" the connection to see if has been broken. Use of actively "probe" the connection to see if has been broken. Use of
the NULL procedure is one recommended way to do so. So, when a the NULL procedure is one recommended way to do so. So, when a
skipping to change at page 27, line 22 skipping to change at page 25, line 38
protection (QOP), and service (authentication, integrity, privacy). protection (QOP), and service (authentication, integrity, privacy).
For the mandated security mechanisms, NFSv4 specifies that a QOP of For the mandated security mechanisms, NFSv4 specifies that a QOP of
zero is used, leaving it up to the mechanism or the mechanism's zero is used, leaving it up to the mechanism or the mechanism's
configuration to map QOP zero to an appropriate level of protection. configuration to map QOP zero to an appropriate level of protection.
Each mandated mechanism specifies a minimum set of cryptographic Each mandated mechanism specifies a minimum set of cryptographic
algorithms for implementing integrity and privacy. NFSv4 clients and algorithms for implementing integrity and privacy. NFSv4 clients and
servers MUST be implemented on operating environments that comply servers MUST be implemented on operating environments that comply
with the REQUIRED cryptographic algorithms of each REQUIRED with the REQUIRED cryptographic algorithms of each REQUIRED
mechanism. mechanism.
3.2.1.1. Kerberos V5 as a security triple 3.2.1.1. Kerberos V5 as a Security Triple
The Kerberos V5 GSS-API mechanism as described in [RFC4121] MUST be The Kerberos V5 GSS-API mechanism as described in [RFC4121] MUST be
implemented with the RPCSEC_GSS services as specified in the implemented with the RPCSEC_GSS services as specified in the
following table: following table:
column descriptions: column descriptions:
1 == number of pseudo flavor 1 == number of pseudo flavor
2 == name of pseudo flavor 2 == name of pseudo flavor
3 == mechanism's OID 3 == mechanism's OID
4 == RPCSEC_GSS service 4 == RPCSEC_GSS service
skipping to change at page 28, line 29 skipping to change at page 27, line 4
or server, so there is some due diligence required by the user of or server, so there is some due diligence required by the user of
NFSv4 to ensure that security is acceptable where needed. Guidance NFSv4 to ensure that security is acceptable where needed. Guidance
is provided in [RFC6649] as to why weak algorithms should be disabled is provided in [RFC6649] as to why weak algorithms should be disabled
by default. by default.
3.3. Security Negotiation 3.3. Security Negotiation
With the NFSv4 server potentially offering multiple security With the NFSv4 server potentially offering multiple security
mechanisms, the client needs a method to determine or negotiate which mechanisms, the client needs a method to determine or negotiate which
mechanism is to be used for its communication with the server. The mechanism is to be used for its communication with the server. The
NFS server may have multiple points within its filesystem name space NFS server may have multiple points within its file system name space
that are available for use by NFS clients. In turn the NFS server that are available for use by NFS clients. In turn the NFS server
may be configured such that each of these entry points may have may be configured such that each of these entry points may have
different or multiple security mechanisms in use. different or multiple security mechanisms in use.
The security negotiation between client and server SHOULD be done The security negotiation between client and server SHOULD be done
with a secure channel to eliminate the possibility of a third party with a secure channel to eliminate the possibility of a third party
intercepting the negotiation sequence and forcing the client and intercepting the negotiation sequence and forcing the client and
server to choose a lower level of security than required or desired. server to choose a lower level of security than required or desired.
See Section 17 for further discussion. See Section 17 for further discussion.
3.3.1. SECINFO 3.3.1. SECINFO
The new SECINFO operation will allow the client to determine, on a The new SECINFO operation will allow the client to determine, on a
per filehandle basis, what security triple is to be used for server per filehandle basis, what security triple (see [RFC2743]) is to be
access. In general, the client will not have to use the SECINFO used for server access. In general, the client will not have to use
operation except during initial communication with the server or when the SECINFO operation except during initial communication with the
the client crosses policy boundaries at the server. It is possible server or when the client crosses policy boundaries at the server.
that the server's policies change during the client's interaction It is possible that the server's policies change during the client's
therefore forcing the client to negotiate a new security triple. interaction therefore forcing the client to negotiate a new security
triple.
3.3.2. Security Error 3.3.2. Security Error
Based on the assumption that each NFSv4 client and server MUST Based on the assumption that each NFSv4 client and server MUST
support a minimum set of security (i.e., Kerberos-V5 under support a minimum set of security (i.e., Kerberos-V5 under
RPCSEC_GSS), the NFS client will start its communication with the RPCSEC_GSS), the NFS client will start its communication with the
server with one of the minimal security triples. During server with one of the minimal security triples. During
communication with the server, the client may receive an NFS error of communication with the server, the client may receive an NFS error of
NFS4ERR_WRONGSEC. This error allows the server to notify the client NFS4ERR_WRONGSEC. This error allows the server to notify the client
that the security triple currently being used is not appropriate for that the security triple currently being used is not appropriate for
access to the server's filesystem resources. The client is then access to the server's file system resources. The client is then
responsible for determining what security triples are available at responsible for determining what security triples are available at
the server and choose one which is appropriate for the client. See the server and choose one which is appropriate for the client. See
Section 15.33 for further discussion of how the client will respond Section 15.33 for further discussion of how the client will respond
to the NFS4ERR_WRONGSEC error and use SECINFO. to the NFS4ERR_WRONGSEC error and use SECINFO.
3.3.3. Callback RPC Authentication 3.3.3. Callback RPC Authentication
Except as noted elsewhere in this section, the callback RPC Except as noted elsewhere in this section, the callback RPC
(described later) MUST mutually authenticate the NFS server to the (described later) MUST mutually authenticate the NFS server to the
principal that acquired the client ID (also described later), using principal that acquired the client ID (also described later), using
skipping to change at page 30, line 21 skipping to change at page 28, line 42
For Kerberos V5, nfs/hostname would be a server principal in the For Kerberos V5, nfs/hostname would be a server principal in the
Kerberos Key Distribution Center database. This is the same Kerberos Key Distribution Center database. This is the same
principal the client acquired a GSS-API context for when it issued principal the client acquired a GSS-API context for when it issued
the SETCLIENTID operation, therefore, the realm name for the server the SETCLIENTID operation, therefore, the realm name for the server
principal must be the same for the callback as it was for the principal must be the same for the callback as it was for the
SETCLIENTID. SETCLIENTID.
4. Filehandles 4. Filehandles
The filehandle in the NFS protocol is a per server unique identifier The filehandle in the NFS protocol is a per server unique identifier
for a filesystem object. The contents of the filehandle are opaque for a file system object. The contents of the filehandle are opaque
to the client. Therefore, the server is responsible for translating to the client. Therefore, the server is responsible for translating
the filehandle to an internal representation of the filesystem the filehandle to an internal representation of the file system
object. object.
4.1. Obtaining the First Filehandle 4.1. Obtaining the First Filehandle
The operations of the NFS protocol are defined in terms of one or The operations of the NFS protocol are defined in terms of one or
more filehandles. Therefore, the client needs a filehandle to more filehandles. Therefore, the client needs a filehandle to
initiate communication with the server. With the NFSv2 protocol initiate communication with the server. With the NFSv2 protocol
[RFC1094] and the NFSv3 protocol [RFC1813], there exists an ancillary [RFC1094] and the NFSv3 protocol [RFC1813], there exists an ancillary
protocol to obtain this first filehandle. The MOUNT protocol, RPC protocol to obtain this first filehandle. The MOUNT protocol, RPC
program number 100005, provides the mechanism of translating a string program number 100005, provides the mechanism of translating a string
based filesystem path name to a filehandle which can then be used by based file system path name to a filehandle which can then be used by
the NFS protocols. the NFS protocols.
The MOUNT protocol has deficiencies in the area of security and use The MOUNT protocol has deficiencies in the area of security and use
via firewalls. This is one reason that the use of the public via firewalls. This is one reason that the use of the public
filehandle was introduced in [RFC2054] and [RFC2055]. With the use filehandle was introduced in [RFC2054] and [RFC2055]. With the use
of the public filehandle in combination with the LOOKUP operation in of the public filehandle in combination with the LOOKUP operation in
the NFSv2 and NFSv3 protocols, it has been demonstrated that the the NFSv2 and NFSv3 protocols, it has been demonstrated that the
MOUNT protocol is unnecessary for viable interaction between NFS MOUNT protocol is unnecessary for viable interaction between NFS
client and server. client and server.
Therefore, the NFSv4 protocol will not use an ancillary protocol for Therefore, the NFSv4 protocol will not use an ancillary protocol for
translation from string based path names to a filehandle. Two translation from string based path names to a filehandle. Two
special filehandles will be used as starting points for the NFS special filehandles will be used as starting points for the NFS
client. client.
4.1.1. Root Filehandle 4.1.1. Root Filehandle
The first of the special filehandles is the ROOT filehandle. The The first of the special filehandles is the ROOT filehandle. The
ROOT filehandle is the "conceptual" root of the filesystem name space ROOT filehandle is the "conceptual" root of the file system name
at the NFS server. The client uses or starts with the ROOT space at the NFS server. The client uses or starts with the ROOT
filehandle by employing the PUTROOTFH operation. The PUTROOTFH filehandle by employing the PUTROOTFH operation. The PUTROOTFH
operation instructs the server to set the "current" filehandle to the operation instructs the server to set the "current" filehandle to the
ROOT of the server's file tree. Once this PUTROOTFH operation is ROOT of the server's file tree. Once this PUTROOTFH operation is
used, the client can then traverse the entirety of the server's file used, the client can then traverse the entirety of the server's file
tree with the LOOKUP operation. A complete discussion of the server tree with the LOOKUP operation. A complete discussion of the server
name space is in Section 8. name space is in Section 7.
4.1.2. Public Filehandle 4.1.2. Public Filehandle
The second special filehandle is the PUBLIC filehandle. Unlike the The second special filehandle is the PUBLIC filehandle. Unlike the
ROOT filehandle, the PUBLIC filehandle may be bound or represent an ROOT filehandle, the PUBLIC filehandle may be bound or represent an
arbitrary filesystem object at the server. The server is responsible arbitrary file system object at the server. The server is
for this binding. It may be that the PUBLIC filehandle and the ROOT responsible for this binding. It may be that the PUBLIC filehandle
filehandle refer to the same filesystem object. However, it is up to and the ROOT filehandle refer to the same file system object.
the administrative software at the server and the policies of the However, it is up to the administrative software at the server and
server administrator to define the binding of the PUBLIC filehandle the policies of the server administrator to define the binding of the
and server filesystem object. The client may not make any PUBLIC filehandle and server file system object. The client may not
assumptions about this binding. The client uses the PUBLIC make any assumptions about this binding. The client uses the PUBLIC
filehandle via the PUTPUBFH operation. filehandle via the PUTPUBFH operation.
4.2. Filehandle Types 4.2. Filehandle Types
In the NFSv2 and NFSv3 protocols, there was one type of filehandle In the NFSv2 and NFSv3 protocols, there was one type of filehandle
with a single set of semantics. This type of filehandle is termed with a single set of semantics. This type of filehandle is termed
"persistent" in NFS Version 4. The semantics of a persistent "persistent" in NFS Version 4. The semantics of a persistent
filehandle remain the same as before. A new type of filehandle filehandle remain the same as before. A new type of filehandle
introduced in NFS Version 4 is the "volatile" filehandle, which introduced in NFS Version 4 is the "volatile" filehandle, which
attempts to accommodate certain server environments. attempts to accommodate certain server environments.
The volatile filehandle type was introduced to address server The volatile filehandle type was introduced to address server
functionality or implementation issues which make correct functionality or implementation issues which make correct
implementation of a persistent filehandle infeasible. Some server implementation of a persistent filehandle infeasible. Some server
environments do not provide a filesystem level invariant that can be environments do not provide a file system level invariant that can be
used to construct a persistent filehandle. The underlying server used to construct a persistent filehandle. The underlying server
filesystem may not provide the invariant or the server's filesystem file system may not provide the invariant or the server's file system
programming interfaces may not provide access to the needed programming interfaces may not provide access to the needed
invariant. Volatile filehandles may ease the implementation of invariant. Volatile filehandles may ease the implementation of
server functionality such as hierarchical storage management or server functionality such as hierarchical storage management or file
filesystem reorganization or migration. However, the volatile system reorganization or migration. However, the volatile filehandle
filehandle increases the implementation burden for the client. increases the implementation burden for the client.
Since the client will need to handle persistent and volatile Since the client will need to handle persistent and volatile
filehandles differently, a file attribute is defined which may be filehandles differently, a file attribute is defined which may be
used by the client to determine the filehandle types being returned used by the client to determine the filehandle types being returned
by the server. by the server.
4.2.1. General Properties of a Filehandle 4.2.1. General Properties of a Filehandle
The filehandle contains all the information the server needs to The filehandle contains all the information the server needs to
distinguish an individual file. To the client, the filehandle is distinguish an individual file. To the client, the filehandle is
opaque. The client stores filehandles for use in a later request and opaque. The client stores filehandles for use in a later request and
can compare two filehandles from the same server for equality by can compare two filehandles from the same server for equality by
doing a byte-by-byte comparison. However, the client MUST NOT doing a byte-by-byte comparison. However, the client MUST NOT
otherwise interpret the contents of filehandles. If two filehandles otherwise interpret the contents of filehandles. If two filehandles
from the same server are equal, they MUST refer to the same file. from the same server are equal, they MUST refer to the same file
Servers SHOULD try to maintain a one-to-one correspondence between system object. Servers SHOULD try to maintain a one-to-one
filehandles and files but this is not required. Clients MUST use correspondence between filehandles and file system objects but this
filehandle comparisons only to improve performance, not for correct is not required. Clients MUST use filehandle comparisons only to
behavior. All clients need to be prepared for situations in which it improve performance, not for correct behavior. All clients need to
cannot be determined whether two filehandles denote the same object be prepared for situations in which it cannot be determined whether
and in such cases, avoid making invalid assumptions which might cause two filehandles denote the same object and in such cases, avoid
incorrect behavior. Further discussion of filehandle and attribute making invalid assumptions which might cause incorrect behavior.
comparison in the context of data caching is presented in Further discussion of filehandle and attribute comparison in the
Section 10.3.4. context of data caching is presented in Section 10.3.4.
As an example, in the case that two different path names when As an example, in the case that two different path names when
traversed at the server terminate at the same filesystem object, the traversed at the server terminate at the same file system object, the
server SHOULD return the same filehandle for each path. This can server SHOULD return the same filehandle for each path. This can
occur if a hard link is used to create two file names which refer to occur if a hard link is used to create two file names which refer to
the same underlying file object and associated data. For example, if the same underlying file object and associated data. For example, if
paths /a/b/c and /a/d/c refer to the same file, the server SHOULD paths /a/b/c and /a/d/c refer to the same file, the server SHOULD
return the same filehandle for both path names traversals. return the same filehandle for both path names traversals.
4.2.2. Persistent Filehandle 4.2.2. Persistent Filehandle
A persistent filehandle is defined as having a fixed value for the A persistent filehandle is defined as having a fixed value for the
lifetime of the filesystem object to which it refers. Once the lifetime of the file system object to which it refers. Once the
server creates the filehandle for a filesystem object, the server server creates the filehandle for a file system object, the server
MUST accept the same filehandle for the object for the lifetime of MUST accept the same filehandle for the object for the lifetime of
the object. If the server restarts or reboots the NFS server must the object. If the server restarts or reboots the NFS server must
honor the same filehandle value as it did in the server's previous honor the same filehandle value as it did in the server's previous
instantiation. Similarly, if the filesystem is migrated, the new NFS instantiation. Similarly, if the file system is migrated, the new
server must honor the same filehandle as the old NFS server. NFS server must honor the same filehandle as the old NFS server.
The persistent filehandle will be become stale or invalid when the The persistent filehandle will be become stale or invalid when the
filesystem object is removed. When the server is presented with a file system object is removed. When the server is presented with a
persistent filehandle that refers to a deleted object, it MUST return persistent filehandle that refers to a deleted object, it MUST return
an error of NFS4ERR_STALE. A filehandle may become stale when the an error of NFS4ERR_STALE. A filehandle may become stale when the
filesystem containing the object is no longer available. The file file system containing the object is no longer available. The file
system may become unavailable if it exists on removable media and the system may become unavailable if it exists on removable media and the
media is no longer available at the server or the filesystem in whole media is no longer available at the server or the file system in
has been destroyed or the filesystem has simply been removed from the whole has been destroyed or the file system has simply been removed
server's name space (i.e., unmounted in a UNIX environment). from the server's name space (i.e., unmounted in a UNIX environment).
4.2.3. Volatile Filehandle 4.2.3. Volatile Filehandle
A volatile filehandle does not share the same longevity A volatile filehandle does not share the same longevity
characteristics of a persistent filehandle. The server may determine characteristics of a persistent filehandle. The server may determine
that a volatile filehandle is no longer valid at many different that a volatile filehandle is no longer valid at many different
points in time. If the server can definitively determine that a points in time. If the server can definitively determine that a
volatile filehandle refers to an object that has been removed, the volatile filehandle refers to an object that has been removed, the
server should return NFS4ERR_STALE to the client (as is the case for server should return NFS4ERR_STALE to the client (as is the case for
persistent filehandles). In all other cases where the server persistent filehandles). In all other cases where the server
determines that a volatile filehandle can no longer be used, it determines that a volatile filehandle can no longer be used, it
should return an error of NFS4ERR_FHEXPIRED. should return an error of NFS4ERR_FHEXPIRED.
The mandatory attribute "fh_expire_type" is used by the client to The mandatory attribute "fh_expire_type" is used by the client to
determine what type of filehandle the server is providing for a determine what type of filehandle the server is providing for a
particular filesystem. This attribute is a bitmask with the particular file system. This attribute is a bitmask with the
following values: following values:
FH4_PERSISTENT: The value of FH4_PERSISTENT is used to indicate a FH4_PERSISTENT: The value of FH4_PERSISTENT is used to indicate a
persistent filehandle, which is valid until the object is removed persistent filehandle, which is valid until the object is removed
from the filesystem. The server will not return NFS4ERR_FHEXPIRED from the file system. The server will not return
for this filehandle. FH4_PERSISTENT is defined as a value in NFS4ERR_FHEXPIRED for this filehandle. FH4_PERSISTENT is defined
which none of the bits specified below are set. as a value in which none of the bits specified below are set.
FH4_VOLATILE_ANY: The filehandle may expire at any time, except as FH4_VOLATILE_ANY: The filehandle may expire at any time, except as
specifically excluded (i.e., FH4_NOEXPIRE_WITH_OPEN). specifically excluded (i.e., FH4_NOEXPIRE_WITH_OPEN).
FH4_NOEXPIRE_WITH_OPEN: May only be set when FH4_VOLATILE_ANY is FH4_NOEXPIRE_WITH_OPEN: May only be set when FH4_VOLATILE_ANY is
set. If this bit is set, then the meaning of FH4_VOLATILE_ANY is set. If this bit is set, then the meaning of FH4_VOLATILE_ANY is
qualified to exclude any expiration of the filehandle when it is qualified to exclude any expiration of the filehandle when it is
open. open.
FH4_VOL_MIGRATION: The filehandle will expire as a result of FH4_VOL_MIGRATION: The filehandle will expire as a result of
skipping to change at page 34, line 44 skipping to change at page 33, line 19
NFS4ERR_BADHANDLE. If the generation number does not match, return NFS4ERR_BADHANDLE. If the generation number does not match, return
NFS4ERR_FHEXPIRED. NFS4ERR_FHEXPIRED.
When the server reboots, the table is gone (it is volatile). When the server reboots, the table is gone (it is volatile).
If volatile bit is 0, then it is a persistent filehandle with a If volatile bit is 0, then it is a persistent filehandle with a
different structure following it. different structure following it.
4.3. Client Recovery from Filehandle Expiration 4.3. Client Recovery from Filehandle Expiration
If possible, the client SHOULD recover from the receipt of an If possible, the client should recover from the receipt of an
NFS4ERR_FHEXPIRED error. The client must take on additional NFS4ERR_FHEXPIRED error. The client must take on additional
responsibility so that it may prepare itself to recover from the responsibility so that it may prepare itself to recover from the
expiration of a volatile filehandle. If the server returns expiration of a volatile filehandle. If the server returns
persistent filehandles, the client does not need these additional persistent filehandles, the client does not need these additional
steps. steps.
For volatile filehandles, most commonly the client will need to store For volatile filehandles, most commonly the client will need to store
the component names leading up to and including the filesystem object the component names leading up to and including the file system
in question. With these names, the client should be able to recover object in question. With these names, the client should be able to
by finding a filehandle in the name space that is still available or recover by finding a filehandle in the name space that is still
by starting at the root of the server's filesystem name space. available or by starting at the root of the server's file system name
space.
If the expired filehandle refers to an object that has been removed If the expired filehandle refers to an object that has been removed
from the filesystem, obviously the client will not be able to recover from the file system, obviously the client will not be able to
from the expired filehandle. recover from the expired filehandle.
It is also possible that the expired filehandle refers to a file that It is also possible that the expired filehandle refers to a file that
has been renamed. If the file was renamed by another client, again has been renamed. If the file was renamed by another client, again
it is possible that the original client will not be able to recover. it is possible that the original client will not be able to recover.
However, in the case that the client itself is renaming the file and However, in the case that the client itself is renaming the file and
the file is open, it is possible that the client may be able to the file is open, it is possible that the client may be able to
recover. The client can determine the new path name based on the recover. The client can determine the new path name based on the
processing of the rename request. The client can then regenerate the processing of the rename request. The client can then regenerate the
new filehandle based on the new path name. The client could also use new filehandle based on the new path name. The client could also use
the compound operation mechanism to construct a set of operations the compound operation mechanism to construct a set of operations
like: like:
RENAME A B RENAME A B
LOOKUP B LOOKUP B
GETFH GETFH
Note that the COMPOUND procedure does not provide atomicity. This Note that the COMPOUND procedure does not provide atomicity. This
example only reduces the overhead of recovering from an expired example only reduces the overhead of recovering from an expired
filehandle. filehandle.
5. File Attributes 5. Attributes
To meet the requirements of extensibility and increased To meet the requirements of extensibility and increased
interoperability with non-UNIX platforms, attributes need to be interoperability with non-UNIX platforms, attributes need to be
handled in a flexible manner. The NFSv3 fattr3 structure contains a handled in a flexible manner. The NFSv3 fattr3 structure contains a
fixed list of attributes that not all clients and servers are able to fixed list of attributes that not all clients and servers are able to
support or care about. The fattr3 structure cannot be extended as support or care about. The fattr3 structure cannot be extended as
new needs arise and it provides no way to indicate non-support. With new needs arise and it provides no way to indicate non-support. With
the NFSv4.0 protocol, the client is able to query what attributes the the NFSv4.0 protocol, the client is able to query what attributes the
server supports and construct requests with only those supported server supports and construct requests with only those supported
attributes (or a subset thereof). attributes (or a subset thereof).
skipping to change at page 36, line 9 skipping to change at page 34, line 31
supported in the NFSv4.0 protocol by a specific and well-defined supported in the NFSv4.0 protocol by a specific and well-defined
encoding and are identified by number. They are requested by setting encoding and are identified by number. They are requested by setting
a bit in the bit vector sent in the GETATTR request; the server a bit in the bit vector sent in the GETATTR request; the server
response includes a bit vector to list what attributes were returned response includes a bit vector to list what attributes were returned
in the response. New REQUIRED or RECOMMENDED attributes may be added in the response. New REQUIRED or RECOMMENDED attributes may be added
to the NFSv4 protocol as part of a new minor version by publishing a to the NFSv4 protocol as part of a new minor version by publishing a
Standards Track RFC which allocates a new attribute number value and Standards Track RFC which allocates a new attribute number value and
defines the encoding for the attribute. See Section 11 for further defines the encoding for the attribute. See Section 11 for further
discussion. discussion.
Named attributes are accessed by the new OPENATTR operation, which Named attributes are accessed by the OPENATTR operation, which
accesses a hidden directory of attributes associated with a file accesses a hidden directory of attributes associated with a file
system object. OPENATTR takes a filehandle for the object and system object. OPENATTR takes a filehandle for the object and
returns the filehandle for the attribute hierarchy. The filehandle returns the filehandle for the attribute hierarchy. The filehandle
for the named attributes is a directory object accessible by LOOKUP for the named attributes is a directory object accessible by LOOKUP
or READDIR and contains files whose names represent the named or READDIR and contains files whose names represent the named
attributes and whose data bytes are the value of the attribute. For attributes and whose data bytes are the value of the attribute. For
example: example:
+----------+-----------+---------------------------------+ +----------+-----------+---------------------------------+
| LOOKUP | "foo" | ; look up file | | LOOKUP | "foo" | ; look up file |
skipping to change at page 37, line 8 skipping to change at page 35, line 31
5.1. REQUIRED Attributes 5.1. REQUIRED Attributes
These MUST be supported by every NFSv4.0 client and server in order These MUST be supported by every NFSv4.0 client and server in order
to ensure a minimum level of interoperability. The server MUST store to ensure a minimum level of interoperability. The server MUST store
and return these attributes, and the client MUST be able to function and return these attributes, and the client MUST be able to function
with an attribute set limited to these attributes. With just the with an attribute set limited to these attributes. With just the
REQUIRED attributes some client functionality may be impaired or REQUIRED attributes some client functionality may be impaired or
limited in some ways. A client may ask for any of these attributes limited in some ways. A client may ask for any of these attributes
to be returned by setting a bit in the GETATTR request, and the to be returned by setting a bit in the GETATTR request, and the
server must return their value. server MUST return their value.
5.2. RECOMMENDED Attributes 5.2. RECOMMENDED Attributes
These attributes are understood well enough to warrant support in the These attributes are understood well enough to warrant support in the
NFSv4.0 protocol. However, they may not be supported on all clients NFSv4.0 protocol. However, they may not be supported on all clients
and servers. A client MAY ask for any of these attributes to be and servers. A client MAY ask for any of these attributes to be
returned by setting a bit in the GETATTR request but must handle the returned by setting a bit in the GETATTR request but must handle the
case where the server does not return them. A client MAY ask for the case where the server does not return them. A client MAY ask for the
set of attributes the server supports and SHOULD NOT request set of attributes the server supports and SHOULD NOT request
attributes the server does not support. A server should be tolerant attributes the server does not support. A server should be tolerant
skipping to change at page 40, line 15 skipping to change at page 38, line 36
5.6. REQUIRED Attributes - List and Definition References 5.6. REQUIRED Attributes - List and Definition References
The list of REQUIRED attributes appears in Table 2. The meaning of The list of REQUIRED attributes appears in Table 2. The meaning of
the columns of the table are: the columns of the table are:
o Name: The name of attribute o Name: The name of attribute
o Id: The number assigned to the attribute. In the event of o Id: The number assigned to the attribute. In the event of
conflicts between the assigned number and conflicts between the assigned number and
[I-D.ietf-nfsv4-rfc3530bis-dot-x], the latter is likely [I-D.ietf-nfsv4-rfc3530bis-dot-x], the latter is authoritative,
authoritative, but should be resolved with Errata to this document but in such an event, it should be resolved with Errata to this
and/or [I-D.ietf-nfsv4-rfc3530bis-dot-x]. See [ISEG_errata] for document and/or [I-D.ietf-nfsv4-rfc3530bis-dot-x]. See
the Errata process. [ISEG_errata] for the Errata process.
o Data Type: The XDR data type of the attribute. o Data Type: The XDR data type of the attribute.
o Acc: Access allowed to the attribute. R means read-only (GETATTR o Acc: Access allowed to the attribute. R means read-only (GETATTR
may retrieve, SETATTR may not set). W means write-only (SETATTR may retrieve, SETATTR may not set). W means write-only (SETATTR
may set, GETATTR may not retrieve). R W means read/write (GETATTR may set, GETATTR may not retrieve). R W means read/write (GETATTR
may retrieve, SETATTR may set). may retrieve, SETATTR may set).
o Defined in: The section of this specification that describes the o Defined in: The section of this specification that describes the
attribute. attribute.
skipping to change at page 40, line 44 skipping to change at page 39, line 19
| type | 1 | nfs_ftype4 | R | Section 5.8.1.2 | | type | 1 | nfs_ftype4 | R | Section 5.8.1.2 |
| fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 |
| change | 3 | uint64_t | R | Section 5.8.1.4 | | change | 3 | uint64_t | R | Section 5.8.1.4 |
| size | 4 | uint64_t | R W | Section 5.8.1.5 | | size | 4 | uint64_t | R W | Section 5.8.1.5 |
| link_support | 5 | bool | R | Section 5.8.1.6 | | link_support | 5 | bool | R | Section 5.8.1.6 |
| symlink_support | 6 | bool | R | Section 5.8.1.7 | | symlink_support | 6 | bool | R | Section 5.8.1.7 |
| named_attr | 7 | bool | R | Section 5.8.1.8 | | named_attr | 7 | bool | R | Section 5.8.1.8 |
| fsid | 8 | fsid4 | R | Section 5.8.1.9 | | fsid | 8 | fsid4 | R | Section 5.8.1.9 |
| unique_handles | 9 | bool | R | Section 5.8.1.10 | | unique_handles | 9 | bool | R | Section 5.8.1.10 |
| lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 | | lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 |
| rdattr_error | 11 | enum | R | Section 5.8.1.12 | | rdattr_error | 11 | nfsstat4 | R | Section 5.8.1.12 |
| filehandle | 19 | nfs_fh4 | R | Section 5.8.1.13 | | filehandle | 19 | nfs_fh4 | R | Section 5.8.1.13 |
+-----------------+----+------------+-----+------------------+ +-----------------+----+------------+-----+------------------+
Table 2 Table 2
5.7. RECOMMENDED Attributes - List and Definition References 5.7. RECOMMENDED Attributes - List and Definition References
The RECOMMENDED attributes are defined in Table 3. The meanings of The RECOMMENDED attributes are defined in Table 3. The meanings of
the column headers are the same as Table 2; see Section 5.6 for the the column headers are the same as Table 2; see Section 5.6 for the
meanings. meanings.
skipping to change at page 43, line 20 skipping to change at page 41, line 40
5.8.1.3. Attribute 2: fh_expire_type 5.8.1.3. Attribute 2: fh_expire_type
Server uses this to specify filehandle expiration behavior to the Server uses this to specify filehandle expiration behavior to the
client. See Section 4 for additional description. client. See Section 4 for additional description.
5.8.1.4. Attribute 3: change 5.8.1.4. Attribute 3: change
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
file data, directory contents, or attributes of the object have been file data, directory contents, or attributes of the object have been
modified. The server may return the object's time_metadata attribute modified. The server MAY return the object's time_metadata attribute
for this attribute's value but only if the file system object cannot for this attribute's value but only if the file system object cannot
be updated more frequently than the resolution of time_metadata. be updated more frequently than the resolution of time_metadata.
5.8.1.5. Attribute 4: size 5.8.1.5. Attribute 4: size
The size of the object in bytes. The size of the object in bytes.
5.8.1.6. Attribute 5: link_support 5.8.1.6. Attribute 5: link_support
TRUE, if the object's file system supports hard links. TRUE, if the object's file system supports hard links.
skipping to change at page 48, line 18 skipping to change at page 46, line 30
space beyond the current allocation that can be allocated to this space beyond the current allocation that can be allocated to this
file or directory before further allocations will be refused. It is file or directory before further allocations will be refused. It is
understood that this space may be consumed by allocations to other understood that this space may be consumed by allocations to other
files or directories. files or directories.
5.8.2.25. Attribute 39: quota_avail_soft 5.8.2.25. Attribute 39: quota_avail_soft
The value in bytes that represents the amount of additional disk The value in bytes that represents the amount of additional disk
space that can be allocated to this file or directory before the user space that can be allocated to this file or directory before the user
may reasonably be warned. It is understood that this space may be may reasonably be warned. It is understood that this space may be
consumed by allocations to other files or directories though there is consumed by allocations to other files or directories though there
a rule as to which other files or directories. may exist server side rules as to which other files or directories.
5.8.2.26. Attribute 40: quota_used 5.8.2.26. Attribute 40: quota_used
The value in bytes that represents the amount of disc space used by The value in bytes that represents the amount of disk space used by
this file or directory and possibly a number of other similar files this file or directory and possibly a number of other similar files
or directories, where the set of "similar" meets at least the or directories, where the set of "similar" meets at least the
criterion that allocating space to any file or directory in the set criterion that allocating space to any file or directory in the set
will reduce the "quota_avail_hard" of every other file or directory will reduce the "quota_avail_hard" of every other file or directory
in the set. in the set.
Note that there may be a number of distinct but overlapping sets of Note that there may be a number of distinct but overlapping sets of
files or directories for which a quota_used value is maintained, files or directories for which a quota_used value is maintained,
e.g., "all files with a given owner", "all files with a given group e.g., "all files with a given owner", "all files with a given group
owner", etc. The server is at liberty to choose any of those sets owner", etc. The server is at liberty to choose any of those sets
skipping to change at page 51, line 46 skipping to change at page 50, line 9
as valid a set of users for at least one domain. A server may treat as valid a set of users for at least one domain. A server may treat
other domains as having no valid translations. A more general other domains as having no valid translations. A more general
service is provided when a server is capable of accepting users for service is provided when a server is capable of accepting users for
multiple domains, or for all domains, subject to security multiple domains, or for all domains, subject to security
constraints. constraints.
As an implementation guide, both clients and servers may provide a As an implementation guide, both clients and servers may provide a
means to configure the "dns_domain" portion of the owner string. For means to configure the "dns_domain" portion of the owner string. For
example, the DNS domain name might be "lab.example.org", but the user example, the DNS domain name might be "lab.example.org", but the user
names are defined in "example.org". In the absence of such a names are defined in "example.org". In the absence of such a
configuration, or as a default, the current DNS domain name should be configuration, or as a default, the current DNS domain name of the
the value used for the "dns_domain". server should be the value used for the "dns_domain".
As mentioned above, it is desirable that a server when accepting a As mentioned above, it is desirable that a server when accepting a
string of the form user@domain or group@domain in an attribute, string of the form user@domain or group@domain in an attribute,
return this same string when that corresponding attribute is fetched. return this same string when that corresponding attribute is fetched.
Internationalization issues (for a general discussion of which see Internationalization issues (for a general discussion of which see
Section 12) make this impossible and the client needs to take note of Section 12) may make this impossible and the client needs to take
the following situations: note of the following situations:
o The string representing the domain may be converted to equivalent o The string representing the domain may be converted to equivalent
U-label, if presented using a form other a a U-label. See U-label (see [RFC5890]), if presented using a form other than a a
Section 12.6 for details. U-label. See Section 12.4 for details.
o The user or group may be returned in a different form, due to o The user or group may be returned in a different form, due to
normalization issues, although it will always be a canonically normalization issues, although it will always be a canonically
equivalent string. See Section 12.7.3 for details. equivalent string.
In the case where there is no translation available to the client or In the case where there is no translation available to the client or
server, the attribute value will be constructed without the "@". server, the attribute value will be constructed without the "@".
Therefore, the absence of the "@" from the owner or owner_group Therefore, the absence of the "@" from the owner or owner_group
attribute signifies that no translation was available at the sender attribute signifies that no translation was available at the sender
and that the receiver of the attribute should not use that string as and that the receiver of the attribute should not use that string as
a basis for translation into its own internal format. Even though a basis for translation into its own internal format. Even though
the attribute value cannot be translated, it may still be useful. In the attribute value cannot be translated, it may still be useful. In
the case of a client, the attribute string may be used for local the case of a client, the attribute string may be used for local
display of ownership. display of ownership.
skipping to change at page 53, line 14 skipping to change at page 51, line 25
the the client should only use named identifiers of the form "user@ the the client should only use named identifiers of the form "user@
dns_domain". dns_domain".
The owner string "nobody" may be used to designate an anonymous user, The owner string "nobody" may be used to designate an anonymous user,
which will be associated with a file created by a security principal which will be associated with a file created by a security principal
that cannot be mapped through normal means to the owner attribute. that cannot be mapped through normal means to the owner attribute.
5.10. Character Case Attributes 5.10. Character Case Attributes
With respect to the case_insensitive and case_preserving attributes, With respect to the case_insensitive and case_preserving attributes,
each UCS-4 character (which UTF-8 encodes) has a "long descriptive each Universal Multiple-octet coded Character Set-4 (UCS-4)
name" RFC1345 [RFC1345] which may or may not include the word [ISO.10646-1.1993] character (which UTF-8 encodes) has a "long
"CAPITAL" or "SMALL". The presence of SMALL or CAPITAL allows an NFS descriptive name" RFC1345 [RFC1345] which may or may not include the
server to implement unambiguous and efficient table driven mappings word "CAPITAL" or "SMALL". The presence of SMALL or CAPITAL allows
for case insensitive comparisons, and non-case-preserving storage, an NFS server to implement unambiguous and efficient table driven
although there are variations that occur additional characters with a mappings for case insensitive comparisons, and non-case-preserving
name including "SMALL" or "CAPITAL" are added in a subsequent version storage, although there are variations that occur additional
of Unicode. characters with a name including "SMALL" or "CAPITAL" are added in a
subsequent version of Unicode.
For general character handling and internationalization issues, see
Section 12. For details regarding case mapping, see the section
Case-based Mapping Used for Component4 Strings.
6. Access Control Attributes 6. Access Control Attributes
Access Control Lists (ACLs) are file attributes that specify fine Access Control Lists (ACLs) are file attributes that specify fine
grained access control. This chapter covers the "acl", "aclsupport", grained access control. This chapter covers the "acl", "aclsupport",
"mode", file attributes, and their interactions. Note that file "mode", file attributes, and their interactions. Note that file
attributes may apply to any file system object. attributes may apply to any file system object.
6.1. Goals 6.1. Goals
skipping to change at page 54, line 43 skipping to change at page 53, line 4
can use the OPEN or ACCESS operations to check access without can use the OPEN or ACCESS operations to check access without
modifying or reading data or metadata. modifying or reading data or metadata.
The NFS ACE structure is defined as follows: The NFS ACE structure is defined as follows:
typedef uint32_t acetype4; typedef uint32_t acetype4;
typedef uint32_t aceflag4; typedef uint32_t aceflag4;
typedef uint32_t acemask4; typedef uint32_t acemask4;
struct nfsace4 { struct nfsace4 {
acetype4 type; acetype4 type;
aceflag4 flag; aceflag4 flag;
acemask4 access_mask; acemask4 access_mask;
utf8val_REQUIRED4 who; utf8str_mixed who;
}; };
To determine if a request succeeds, the server processes each nfsace4 To determine if a request succeeds, the server processes each nfsace4
entry in order. Only ACEs which have a "who" that matches the entry in order. Only ACEs which have a "who" that matches the
requester are considered. Each ACE is processed until all of the requester are considered. Each ACE is processed until all of the
bits of the requester's access have been ALLOWED. Once a bit (see bits of the requester's access have been ALLOWED. Once a bit (see
below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer
considered in the processing of later ACEs. If an ACCESS_DENIED_ACE considered in the processing of later ACEs. If an ACCESS_DENIED_ACE
is encountered where the requester's access still has unALLOWED bits is encountered where the requester's access still has unALLOWED bits
in common with the "access_mask" of the ACE, the request is denied. in common with the "access_mask" of the ACE, the request is denied.
When the ACL is fully processed, if there are bits in the requester's When the ACL is fully processed, if there are bits in the requester's
mask that have not been ALLOWED or DENIED, access is denied. mask that have not been ALLOWED or DENIED, access is denied.
skipping to change at page 55, line 46 skipping to change at page 54, line 5
ALLOWED or DENIED by an ACL must be denied. For example, a UNIX- ALLOWED or DENIED by an ACL must be denied. For example, a UNIX-
style server might choose to silently allow read attribute style server might choose to silently allow read attribute
permissions even though an ACL does not explicitly allow those permissions even though an ACL does not explicitly allow those
permissions. (An ACL that explicitly denies permission to read permissions. (An ACL that explicitly denies permission to read
attributes should still be rejected.) attributes should still be rejected.)
The situation is complicated by the fact that a server may have The situation is complicated by the fact that a server may have
multiple modules that enforce ACLs. For example, the enforcement for multiple modules that enforce ACLs. For example, the enforcement for
NFSv4.0 access may be different from, but not weaker than, the NFSv4.0 access may be different from, but not weaker than, the
enforcement for local access, and both may be different from the enforcement for local access, and both may be different from the
enforcement for access through other protocols such as SMB. So it enforcement for access through other protocols such as Server Message
may be useful for a server to accept an ACL even if not all of its Block (SMB). So it may be useful for a server to accept an ACL even
modules are able to support it. if not all of its modules are able to support it.
The guiding principle with regard to NFSv4 access is that the server The guiding principle with regard to NFSv4 access is that the server
must not accept ACLs that appear to make access to the file more must not accept ACLs that appear to make access to the file more
restrictive than it really is. restrictive than it really is.
6.2.1.1. ACE Type 6.2.1.1. ACE Type
The constants used for the type field (acetype4) are as follows: The constants used for the type field (acetype4) are as follows:
const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000;
const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001;
const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002;
const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003;
All four but types are permitted in the acl attribute. All four bit types are permitted in the acl attribute.
+------------------------------+--------------+---------------------+ +------------------------------+--------------+---------------------+
| Value | Abbreviation | Description | | Value | Abbreviation | Description |
+------------------------------+--------------+---------------------+ +------------------------------+--------------+---------------------+
| ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants | | ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants |
| | | the access defined | | | | the access defined |
| | | in acemask4 to the | | | | in acemask4 to the |
| | | file or directory. | | | | file or directory. |
| ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies | | ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies |
| | | the access defined | | | | the access defined |
skipping to change at page 62, line 23 skipping to change at page 60, line 39
Operation(s) affected: Operation(s) affected:
REMOVE REMOVE
RENAME RENAME
Discussion: Discussion:
Permission to delete a file or directory within a directory. Permission to delete a file or directory within a directory.
See Section 6.2.1.3.2 for information on ACE4_DELETE and See Section 6.2.1.3.2 for information on how ACE4_DELETE and
ACE4_DELETE_CHILD interact. ACE4_DELETE_CHILD interact.
ACE4_READ_ATTRIBUTES ACE4_READ_ATTRIBUTES
Operation(s) affected: Operation(s) affected:
GETATTR of file system object attributes GETATTR of file system object attributes
VERIFY VERIFY
NVERIFY NVERIFY
READDIR READDIR
Discussion: Discussion:
The ability to read basic attributes (non-ACLs) of a file. On The ability to read basic attributes (non-ACLs) of a file. On
a UNIX system, basic attributes can be thought of as the stat a UNIX system, basic attributes can be thought of as the stat
level attributes. Allowing this access mask bit would mean the level attributes. Allowing this access mask bit would mean the
entity can execute "ls -l" and stat. If a READDIR operation entity can execute "ls -l" and stat. If a READDIR operation
skipping to change at page 71, line 42 skipping to change at page 69, line 49
6.4. Requirements 6.4. Requirements
The server that supports both mode and ACL must take care to The server that supports both mode and ACL must take care to
synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the
ACEs which have respective who fields of "OWNER@", "GROUP@", and ACEs which have respective who fields of "OWNER@", "GROUP@", and
"EVERYONE@" so that the client can see semantically equivalent access "EVERYONE@" so that the client can see semantically equivalent access
permissions exist whether the client asks for owner, owner_group and permissions exist whether the client asks for owner, owner_group and
mode attributes, or for just the ACL. mode attributes, or for just the ACL.
In this section, much is made of the methods in Section 6.3.2. Many Many requirements refer to Section 6.3.2, but note that the methods
requirements refer to this section. But note that the methods have have behaviors specified with "SHOULD". This is intentional, to
behaviors specified with "SHOULD". This is intentional, to avoid avoid invalidating existing implementations that compute the mode
invalidating existing implementations that compute the mode according according to the withdrawn POSIX ACL draft ([P1003.1e]), rather than
to the withdrawn POSIX ACL draft (1003.1e draft 17), rather than by by actual permissions on owner, group, and other.
actual permissions on owner, group, and other.
6.4.1. Setting the mode and/or ACL Attributes 6.4.1. Setting the mode and/or ACL Attributes
6.4.1.1. Setting mode and not ACL 6.4.1.1. Setting mode and not ACL
When any of the nine low-order mode bits are changed because the mode When any of the nine low-order mode bits are changed because the mode
attribute was set, and no ACL attribute is explicitly set, the acl attribute was set, and no ACL attribute is explicitly set, the acl
attribute must be modified in accordance with the updated value of attribute must be modified in accordance with the updated value of
those bits. This must happen even if the value of the low-order bits those bits. This must happen even if the value of the low-order bits
is the same after the mode is set as before. is the same after the mode is set as before.
skipping to change at page 74, line 6 skipping to change at page 72, line 6
6.4.3. Creating New Objects 6.4.3. Creating New Objects
If a server supports any ACL attributes, it may use the ACL If a server supports any ACL attributes, it may use the ACL
attributes on the parent directory to compute an initial ACL attributes on the parent directory to compute an initial ACL
attribute for a newly created object. This will be referred to as attribute for a newly created object. This will be referred to as
the inherited ACL within this section. The act of adding one or more the inherited ACL within this section. The act of adding one or more
ACEs to the inherited ACL that are based upon ACEs in the parent ACEs to the inherited ACL that are based upon ACEs in the parent
directory's ACL will be referred to as inheriting an ACE within this directory's ACL will be referred to as inheriting an ACE within this
section. section.
Implementors should standardize on what the behavior of CREATE and In the presence or absence of the mode and ACL attributes, the
OPEN must be depending on the presence or absence of the mode and ACL behavior of CREATE and OPEN SHOULD be:
attributes.
1. If just the mode is given in the call: 1. If just the mode is given in the call:
In this case, inheritance SHOULD take place, but the mode MUST be In this case, inheritance SHOULD take place, but the mode MUST be
applied to the inherited ACL as described in Section 6.4.1.1, applied to the inherited ACL as described in Section 6.4.1.1,
thereby modifying the ACL. thereby modifying the ACL.
2. If just the ACL is given in the call: 2. If just the ACL is given in the call:
In this case, inheritance SHOULD NOT take place, and the ACL as In this case, inheritance SHOULD NOT take place, and the ACL as
skipping to change at page 75, line 28 skipping to change at page 73, line 28
directories. directories.
When a new directory is created, the server MAY split any inherited When a new directory is created, the server MAY split any inherited
ACE which is both inheritable and effective (in other words, which ACE which is both inheritable and effective (in other words, which
has neither ACE4_INHERIT_ONLY_ACE nor ACE4_NO_PROPAGATE_INHERIT_ACE has neither ACE4_INHERIT_ONLY_ACE nor ACE4_NO_PROPAGATE_INHERIT_ACE
set), into two ACEs, one with no inheritance flags, and one with set), into two ACEs, one with no inheritance flags, and one with
ACE4_INHERIT_ONLY_ACE set. This makes it simpler to modify the ACE4_INHERIT_ONLY_ACE set. This makes it simpler to modify the
effective permissions on the directory without modifying the ACE effective permissions on the directory without modifying the ACE
which is to be inherited to the new directory's children. which is to be inherited to the new directory's children.
7. Multi-Server Namespace 7. NFS Server Name Space
7.1. Server Exports
On a UNIX server the name space describes all the files reachable by
pathnames under the root directory or "/". On a Windows NT server
the name space constitutes all the files on disks named by mapped
disk letters. NFS server administrators rarely make the entire
server's file system name space available to NFS clients. More often
portions of the name space are made available via an "export"
feature. In previous versions of the NFS protocol, the root
filehandle for each export is obtained through the MOUNT protocol;
the client sends a string that identifies the export of name space
and the server returns the root filehandle for it. The MOUNT
protocol supports an EXPORTS procedure that will enumerate the
server's exports.
7.2. Browsing Exports
The NFSv4 protocol provides a root filehandle that clients can use to
obtain filehandles for these exports via a multi-component LOOKUP. A
common user experience is to use a graphical user interface (perhaps
a file "Open" dialog window) to find a file via progressive browsing
through a directory tree. The client must be able to move from one
export to another export via single-component, progressive LOOKUP
operations.
This style of browsing is not well supported by the NFSv2 and NFSv3
protocols. The client expects all LOOKUP operations to remain within
a single server file system. For example, the device attribute will
not change. This prevents a client from taking name space paths that
span exports.
An automounter on the client can obtain a snapshot of the server's
name space using the EXPORTS procedure of the MOUNT protocol. If it
understands the server's pathname syntax, it can create an image of
the server's name space on the client. The parts of the name space
that are not exported by the server are filled in with a "pseudo file
system" that allows the user to browse from one mounted file system
to another. There is a drawback to this representation of the
server's name space on the client: it is static. If the server
administrator adds a new export the client will be unaware of it.
7.3. Server Pseudo Filesystem
NFSv4 servers avoid this name space inconsistency by presenting all
the exports within the framework of a single server name space. An
NFSv4 client uses LOOKUP and READDIR operations to browse seamlessly
from one export to another. Portions of the server name space that
are not exported are bridged via a "pseudo file system" that provides
a view of exported directories only. A pseudo file system has a
unique fsid and behaves like a normal, read only file system.
Based on the construction of the server's name space, it is possible
that multiple pseudo file systems may exist. For example,
/a pseudo file system
/a/b real file system
/a/b/c pseudo file system
/a/b/c/d real file system
Each of the pseudo file systems are considered separate entities and
therefore will have a unique fsid.
7.4. Multiple Roots
The DOS and Windows operating environments are sometimes described as
having "multiple roots". Filesystems are commonly represented as
disk letters. MacOS represents file systems as top level names.
NFSv4 servers for these platforms can construct a pseudo file system
above these root names so that disk letters or volume names are
simply directory names in the pseudo root.
7.5. Filehandle Volatility
The nature of the server's pseudo file system is that it is a logical
representation of file system(s) available from the server.
Therefore, the pseudo file system is most likely constructed
dynamically when the server is first instantiated. It is expected
that the pseudo file system may not have an on disk counterpart from
which persistent filehandles could be constructed. Even though it is
preferable that the server provide persistent filehandles for the
pseudo file system, the NFS client should expect that pseudo file
system filehandles are volatile. This can be confirmed by checking
the associated "fh_expire_type" attribute for those filehandles in
question. If the filehandles are volatile, the NFS client must be
prepared to recover a filehandle value (e.g., with a multi-component
LOOKUP) when receiving an error of NFS4ERR_FHEXPIRED.
7.6. Exported Root
If the server's root file system is exported, one might conclude that
a pseudo-file system is not needed. This would be wrong. Assume the
following file systems on a server:
/ disk1 (exported)
/a disk2 (not exported)
/a/b disk3 (exported)
Because disk2 is not exported, disk3 cannot be reached with simple
LOOKUPs. The server must bridge the gap with a pseudo-file system.
7.7. Mount Point Crossing
The server file system environment may be constructed in such a way
that one file system contains a directory which is 'covered' or
mounted upon by a second file system. For example:
/a/b (file system 1)
/a/b/c/d (file system 2)
The pseudo file system for this server may be constructed to look
like:
/ (place holder/not exported)
/a/b (file system 1)
/a/b/c/d (file system 2)
It is the server's responsibility to present the pseudo file system
that is complete to the client. If the client sends a lookup request
for the path "/a/b/c/d", the server's response is the filehandle of
the file system "/a/b/c/d". In previous versions of the NFS
protocol, the server would respond with the filehandle of directory
"/a/b/c/d" within the file system "/a/b".
The NFS client will be able to determine if it crosses a server mount
point by a change in the value of the "fsid" attribute.
7.8. Security Policy and Name Space Presentation
The application of the server's security policy needs to be carefully
considered by the implementor. One may choose to limit the
viewability of portions of the pseudo file system based on the
server's perception of the client's ability to authenticate itself
properly. However, with the support of multiple security mechanisms
and the ability to negotiate the appropriate use of these mechanisms,
the server is unable to properly determine if a client will be able
to authenticate itself. If, based on its policies, the server
chooses to limit the contents of the pseudo file system, the server
may effectively hide file systems from a client that may otherwise
have legitimate access.
As suggested practice, the server should apply the security policy of
a shared resource in the server's namespace to the components of the
resource's ancestors. For example:
/
/a/b
/a/b/c
The /a/b/c directory is a real file system and is the shared
resource. The security policy for /a/b/c is Kerberos with integrity.
The server should apply the same security policy to /, /a, and /a/b.
This allows for the extension of the protection of the server's
namespace to the ancestors of the real shared resource.
For the case of the use of multiple, disjoint security mechanisms in
the server's resources, the security for a particular object in the
server's namespace should be the union of all security mechanisms of
all direct descendants.
8. Multi-Server Namespace
NFSv4 supports attributes that allow a namespace to extend beyond the NFSv4 supports attributes that allow a namespace to extend beyond the
boundaries of a single server. It is RECOMMENDED that clients and boundaries of a single server. It is RECOMMENDED that clients and
servers support construction of such multi-server namespaces. Use of servers support construction of such multi-server namespaces. Use of
such multi-server namespaces is OPTIONAL, however, and for many such multi-server namespaces is OPTIONAL, however, and for many
purposes, single-server namespaces are perfectly acceptable. Use of purposes, single-server namespaces are perfectly acceptable. Use of
multi-server namespaces can provide many advantages, however, by multi-server namespaces can provide many advantages, however, by
separating a file system's logical position in a namespace from the separating a file system's logical position in a namespace from the
(possibly changing) logistical and administrative considerations that (possibly changing) logistical and administrative considerations that
result in particular file systems being located on particular result in particular file systems being located on particular
servers. servers.
7.1. Location Attributes 8.1. Location Attributes
NFSv4 contains RECOMMENDED attributes that allow file systems on one NFSv4 contains RECOMMENDED attributes that allow file systems on one
server to be associated with one or more instances of that file server to be associated with one or more instances of that file
system on other servers. These attributes specify such file system system on other servers. These attributes specify such file system
instances by specifying a server address target (either as a DNS name instances by specifying a server address target (either as a DNS name
representing one or more IP addresses or as a literal IP address) representing one or more IP addresses or as a literal IP address)
together with the path of that file system within the associated together with the path of that file system within the associated
single-server namespace. single-server namespace.
The fs_locations RECOMMENDED attribute allows specification of the The fs_locations RECOMMENDED attribute allows specification of the
file system locations where the data corresponding to a given file file system locations where the data corresponding to a given file
system may be found. system may be found.
7.2. File System Presence or Absence 8.2. File System Presence or Absence
A given location in an NFSv4 namespace (typically but not necessarily A given location in an NFSv4 namespace (typically but not necessarily
a multi-server namespace) can have a number of file system instance a multi-server namespace) can have a number of file system instance
locations associated with it via the fs_locations attribute. There locations associated with it via the fs_locations attribute. There
may also be an actual current file system at that location, may also be an actual current file system at that location,
accessible via normal namespace operations (e.g., LOOKUP). In this accessible via normal namespace operations (e.g., LOOKUP). In this
case, the file system is said to be "present" at that position in the case, the file system is said to be "present" at that position in the
namespace, and clients will typically use it, reserving use of namespace, and clients will typically use it, reserving use of
additional locations specified via the location-related attributes to additional locations specified via the location-related attributes to
situations in which the principal location is no longer available. situations in which the principal location is no longer available.
skipping to change at page 77, line 9 skipping to change at page 78, line 26
subsequently. subsequently.
It should be noted that because the check for the current filehandle It should be noted that because the check for the current filehandle
being within an absent file system happens at the start of every being within an absent file system happens at the start of every
operation, operations that change the current filehandle so that it operation, operations that change the current filehandle so that it
is within an absent file system will not result in an error. This is within an absent file system will not result in an error. This
allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be allows such combinations as PUTFH-GETATTR and LOOKUP-GETATTR to be
used to get attribute information, particularly location attribute used to get attribute information, particularly location attribute
information, as discussed below. information, as discussed below.
7.3. Getting Attributes for an Absent File System 8.3. Getting Attributes for an Absent File System
When a file system is absent, most attributes are not available, but When a file system is absent, most attributes are not available, but
it is necessary to allow the client access to the small set of it is necessary to allow the client access to the small set of
attributes that are available, and most particularly that which gives attributes that are available, and most particularly that which gives
information about the correct current locations for this file system, information about the correct current locations for this file system,
fs_locations. fs_locations.
7.3.1. GETATTR Within an Absent File System 8.3.1. GETATTR Within an Absent File System
As mentioned above, an exception is made for GETATTR in that As mentioned above, an exception is made for GETATTR in that
attributes may be obtained for a filehandle within an absent file attributes may be obtained for a filehandle within an absent file
system. This exception only applies if the attribute mask contains system. This exception only applies if the attribute mask contains
at least the fs_locations attribute bit, which indicates the client at least the fs_locations attribute bit, which indicates the client
is interested in a result regarding an absent file system. If it is is interested in a result regarding an absent file system. If it is
not requested, GETATTR will result in an NFS4ERR_MOVED error. not requested, GETATTR will result in an NFS4ERR_MOVED error.
When a GETATTR is done on an absent file system, the set of supported When a GETATTR is done on an absent file system, the set of supported
attributes is very limited. Many attributes, including those that attributes is very limited. Many attributes, including those that
skipping to change at page 78, line 15 skipping to change at page 79, line 34
supported, GETATTR will not return an error, but will return the mask supported, GETATTR will not return an error, but will return the mask
of the actual attributes supported with the results. of the actual attributes supported with the results.
Handling of VERIFY/NVERIFY is similar to GETATTR in that if the Handling of VERIFY/NVERIFY is similar to GETATTR in that if the
attribute mask does not include fs_locations the error NFS4ERR_MOVED attribute mask does not include fs_locations the error NFS4ERR_MOVED
will result. It differs in that any appearance in the attribute mask will result. It differs in that any appearance in the attribute mask
of an attribute not supported for an absent file system (and note of an attribute not supported for an absent file system (and note
that this will include some normally REQUIRED attributes) will also that this will include some normally REQUIRED attributes) will also
cause an NFS4ERR_MOVED result. cause an NFS4ERR_MOVED result.
7.3.2. READDIR and Absent File Systems 8.3.2. READDIR and Absent File Systems
A READDIR performed when the current filehandle is within an absent A READDIR performed when the current filehandle is within an absent
file system will result in an NFS4ERR_MOVED error, since, unlike the file system will result in an NFS4ERR_MOVED error, since, unlike the
case of GETATTR, no such exception is made for READDIR. case of GETATTR, no such exception is made for READDIR.
Attributes for an absent file system may be fetched via a READDIR for Attributes for an absent file system may be fetched via a READDIR for
a directory in a present file system, when that directory contains a directory in a present file system, when that directory contains
the root directories of one or more absent file systems. In this the root directories of one or more absent file systems. In this
case, the handling is as follows: case, the handling is as follows:
skipping to change at page 78, line 47 skipping to change at page 80, line 17
attributes fs_locations or rdattr_error then the occurrence of the attributes fs_locations or rdattr_error then the occurrence of the
root of an absent file system within the directory will result in root of an absent file system within the directory will result in
the READDIR failing with an NFS4ERR_MOVED error. the READDIR failing with an NFS4ERR_MOVED error.
o The unavailability of an attribute because of a file system's o The unavailability of an attribute because of a file system's
absence, even one that is ordinarily REQUIRED, does not result in absence, even one that is ordinarily REQUIRED, does not result in
any error indication. The set of attributes returned for the root any error indication. The set of attributes returned for the root
directory of the absent file system in that case is simply directory of the absent file system in that case is simply
restricted to those actually available. restricted to those actually available.
7.4. Uses of Location Information 8.4. Uses of Location Information
The location-bearing attribute of fs_locations provides, together The location-bearing attribute of fs_locations provides, together
with the possibility of absent file systems, a number of important with the possibility of absent file systems, a number of important
facilities in providing reliable, manageable, and scalable data facilities in providing reliable, manageable, and scalable data
access. access.
When a file system is present, these attributes can provide When a file system is present, these attributes can provide
alternative locations, to be used to access the same data, in the alternative locations, to be used to access the same data, in the
event of server failures, communications problems, or other event of server failures, communications problems, or other
difficulties that make continued access to the current file system difficulties that make continued access to the current file system
skipping to change at page 79, line 40 skipping to change at page 81, line 10
system location provides a means by which file systems located on one system location provides a means by which file systems located on one
server can be associated with a namespace defined by another server, server can be associated with a namespace defined by another server,
thus allowing a general multi-server namespace facility. A thus allowing a general multi-server namespace facility. A
designation of such a location, in place of an absent file system, is designation of such a location, in place of an absent file system, is
called a "referral". called a "referral".
Because client support for location-related attributes is OPTIONAL, a Because client support for location-related attributes is OPTIONAL, a
server may (but is not required to) take action to hide migration and server may (but is not required to) take action to hide migration and
referral events from such clients, by acting as a proxy, for example. referral events from such clients, by acting as a proxy, for example.
7.4.1. File System Replication 8.4.1. File System Replication
The fs_locations attribute provides alternative locations, to be used The fs_locations attribute provides alternative locations, to be used
to access data in place of or in addition to the current file system to access data in place of or in addition to the current file system
instance. On first access to a file system, the client should obtain instance. On first access to a file system, the client should obtain
the value of the set of alternate locations by interrogating the the value of the set of alternate locations by interrogating the
fs_locations attribute. fs_locations attribute.
In the event that server failures, communications problems, or other In the event that server failures, communications problems, or other
difficulties make continued access to the current file system difficulties make continued access to the current file system
impossible or otherwise impractical, the client can use the alternate impossible or otherwise impractical, the client can use the alternate
skipping to change at page 80, line 22 skipping to change at page 81, line 40
accessing the same physical file system. How these different modes accessing the same physical file system. How these different modes
of file system transition are represented within the fs_locations of file system transition are represented within the fs_locations
attribute and how the client deals with file system transition issues attribute and how the client deals with file system transition issues
will be discussed in detail below. will be discussed in detail below.
Multiple server addresses, whether they are derived from a single Multiple server addresses, whether they are derived from a single
entry with a DNS name representing a set of IP addresses or from entry with a DNS name representing a set of IP addresses or from
multiple entries each with its own server address, may correspond to multiple entries each with its own server address, may correspond to
the same actual server. the same actual server.
7.4.2. File System Migration 8.4.2. File System Migration
When a file system is present and becomes absent, clients can be When a file system is present and becomes absent, clients can be
given the opportunity to have continued access to their data, at an given the opportunity to have continued access to their data, at an
alternate location, as specified by the fs_locations attribute. alternate location, as specified by the fs_locations attribute.
Typically, a client will be accessing the file system in question, Typically, a client will be accessing the file system in question,
get an NFS4ERR_MOVED error, and then use the fs_locations attribute get an NFS4ERR_MOVED error, and then use the fs_locations attribute
to determine the new location of the data. to determine the new location of the data.
Such migration can be helpful in providing load balancing or general Such migration can be helpful in providing load balancing or general
resource reallocation. The protocol does not specify how the file resource reallocation. The protocol does not specify how the file
skipping to change at page 81, line 10 skipping to change at page 82, line 28
it must designate the same data. Where file systems are writable, a it must designate the same data. Where file systems are writable, a
change made on the original file system must be visible on all change made on the original file system must be visible on all
migration targets. Where a file system is not writable but migration targets. Where a file system is not writable but
represents a read-only copy (possibly periodically updated) of a represents a read-only copy (possibly periodically updated) of a
writable file system, similar requirements apply to the propagation writable file system, similar requirements apply to the propagation
of updates. Any change visible in the original file system must of updates. Any change visible in the original file system must
already be effected on all migration targets, to avoid any already be effected on all migration targets, to avoid any
possibility that a client, in effecting a transition to the migration possibility that a client, in effecting a transition to the migration
target, will see any reversion in file system state. target, will see any reversion in file system state.
7.4.3. Referrals 8.4.3. Referrals
Referrals provide a way of placing a file system in a location within Referrals provide a way of placing a file system in a location within
the namespace essentially without respect to its physical location on the namespace essentially without respect to its physical location on
a given server. This allows a single server or a set of servers to a given server. This allows a single server or a set of servers to
present a multi-server namespace that encompasses file systems present a multi-server namespace that encompasses file systems
located on multiple servers. Some likely uses of this include located on multiple servers. Some likely uses of this include
establishment of site-wide or organization-wide namespaces, or even establishment of site-wide or organization-wide namespaces, or even
knitting such together into a truly global namespace. knitting such together into a truly global namespace.
Referrals occur when a client determines, upon first referencing a Referrals occur when a client determines, upon first referencing a
skipping to change at page 81, line 48 skipping to change at page 83, line 18
systems. Alternatively, a single multi-server namespace may be systems. Alternatively, a single multi-server namespace may be
administratively segmented with separate referral file systems (on administratively segmented with separate referral file systems (on
separate servers) for each separately administered portion of the separate servers) for each separately administered portion of the
namespace. The top-level referral file system or any segment may use namespace. The top-level referral file system or any segment may use
replicated referral file systems for higher availability. replicated referral file systems for higher availability.
Generally, multi-server namespaces are for the most part uniform, in Generally, multi-server namespaces are for the most part uniform, in
that the same data made available to one client at a given location that the same data made available to one client at a given location
in the namespace is made available to all clients at that location. in the namespace is made available to all clients at that location.
7.5. Location Entries and Server Identity 8.5. Location Entries and Server Identity
As mentioned above, a single location entry may have a server address As mentioned above, a single location entry may have a server address
target in the form of a DNS name that may represent multiple IP target in the form of a DNS name that may represent multiple IP
addresses, while multiple location entries may have their own server addresses, while multiple location entries may have their own server
address targets that reference the same server. address targets that reference the same server.
When multiple addresses for the same server exist, the client may When multiple addresses for the same server exist, the client may
assume that for each file system in the namespace of a given server assume that for each file system in the namespace of a given server
network address, there exist file systems at corresponding namespace network address, there exist file systems at corresponding namespace
locations for each of the other server network addresses. It may do locations for each of the other server network addresses. It may do
skipping to change at page 82, line 27 skipping to change at page 83, line 45
the client cannot assume that these addresses are multiple paths to the client cannot assume that these addresses are multiple paths to
the same server. In most cases, they will be, but the client MUST the same server. In most cases, they will be, but the client MUST
verify that before acting on that assumption. When two server verify that before acting on that assumption. When two server
addresses are designated by a single location entry and they addresses are designated by a single location entry and they
correspond to different servers, this normally indicates some sort of correspond to different servers, this normally indicates some sort of
misconfiguration, and so the client should avoid using such location misconfiguration, and so the client should avoid using such location
entries when alternatives are available. When they are not, clients entries when alternatives are available. When they are not, clients
should pick one of IP addresses and use it, without using others that should pick one of IP addresses and use it, without using others that
are not directed to the same server. are not directed to the same server.
7.6. Additional Client-Side Considerations 8.6. Additional Client-Side Considerations
When clients make use of servers that implement referrals, When clients make use of servers that implement referrals,
replication, and migration, care should be taken that a user who replication, and migration, care should be taken that a user who
mounts a given file system that includes a referral or a relocated mounts a given file system that includes a referral or a relocated
file system continues to see a coherent picture of that user-side file system continues to see a coherent picture of that user-side
file system despite the fact that it contains a number of server-side file system despite the fact that it contains a number of server-side
file systems that may be on different servers. file systems that may be on different servers.
One important issue is upward navigation from the root of a server- One important issue is upward navigation from the root of a server-
side file system to its parent (specified as ".." in UNIX), in the side file system to its parent (specified as ".." in UNIX), in the
skipping to change at page 83, line 13 skipping to change at page 84, line 31
Another issue concerns refresh of referral locations. When referrals Another issue concerns refresh of referral locations. When referrals
are used extensively, they may change as server configurations are used extensively, they may change as server configurations
change. It is expected that clients will cache information related change. It is expected that clients will cache information related
to traversing referrals so that future client-side requests are to traversing referrals so that future client-side requests are
resolved locally without server communication. This is usually resolved locally without server communication. This is usually
rooted in client-side name look up caching. Clients should rooted in client-side name look up caching. Clients should
periodically purge this data for referral points in order to detect periodically purge this data for referral points in order to detect
changes in location information. changes in location information.
A potential problem exists if a client were to allow an open owner to A potential problem exists if a client were to allow an open owner to
have state on multiple filesystems on server, in that it is unclear have state on multiple file systems on server, in that it is unclear
how the sequence numbers associated with open owners are to be dealt how the sequence numbers associated with open owners are to be dealt
with, in the event of transparent state migration. A client can with, in the event of transparent state migration. A client can
avoid such a situation, if it ensures that any use of an open owner avoid such a situation, if it ensures that any use of an open owner
is confined to a single filesystem. is confined to a single file system.
A server MAY decline to migrate state associated with open owners A server MAY decline to migrate state associated with open owners
that span multiple filesystems. In cases in which the server chooses that span multiple file systems. In cases in which the server
not to migrate such state, the server MUST return NFS4ERR_BAD_STATEID chooses not to migrate such state, the server MUST return
when the client uses those stateids on the new server. NFS4ERR_BAD_STATEID when the client uses those stateids on the new
server.
The server MUST return NFS4ERR_STALE_STATEID when the client uses The server MUST return NFS4ERR_STALE_STATEID when the client uses
those stateids on the old server, regardless of whether migration has those stateids on the old server, regardless of whether migration has
occurred or not. occurred or not.
7.7. Effecting File System Transitions 8.7. Effecting File System Referrals
Transitions between file system instances, whether due to switching
between replicas upon server unavailability or to server-initiated
migration events, are best dealt with together. This is so even
though, for the server, pragmatic considerations will normally force
different implementation strategies for planned and unplanned
transitions. Even though the prototypical use cases of replication
and migration contain distinctive sets of features, when all
possibilities for these operations are considered, there is an
underlying unity of these operations, from the client's point of
view, that makes treating them together desirable.
A number of methods are possible for servers to replicate data and to
track client state in order to allow clients to transition between
file system instances with a minimum of disruption. Such methods
vary between those that use inter-server clustering techniques to
limit the changes seen by the client, to those that are less
aggressive, use more standard methods of replicating data, and impose
a greater burden on the client to adapt to the transition.
The NFSv4 protocol does not impose choices on clients and servers
with regard to that spectrum of transition methods. In fact, there
are many valid choices, depending on client and application
requirements and their interaction with server implementation
choices. The NFSv4.0 protocol does not provide the servers a means
of communicating the transition methods. In the NFSv4.1 protocol
[RFC5661], an additional attribute "fs_locations_info" is presented,
which will define the specific choices that can be made, how these
choices are communicated to the client, and how the client is to deal
with any discontinuities.
In the sections below, references will be made to various possible
server implementation choices as a way of illustrating the transition
scenarios that clients may deal with. The intent here is not to
define or limit server implementations but rather to illustrate the
range of issues that clients may face. Again, as the NFSv4.0
protocol does not have an explicit means of communicating these
issues to the client, the intent is to document the problems that can
be faced in a multi-server name space and allow the client to use the
inferred transitions available via fs_locations and other attributes
(see Section 7.9.1).
In the discussion below, references will be made to a file system
having a particular property or to two file systems (typically the
source and destination) belonging to a common class of any of several
types. Two file systems that belong to such a class share some
important aspects of file system behavior that clients may depend
upon when present, to easily effect a seamless transition between
file system instances. Conversely, where the file systems do not
belong to such a common class, the client has to deal with various
sorts of implementation discontinuities that may cause performance or
other issues in effecting a transition.
While fs_locations is available, default assumptions with regard to
such classifications have to be inferred (see Section 7.9.1 for
details).
In cases in which one server is expected to accept opaque values from
the client that originated from another server, the servers SHOULD
encode the "opaque" values in big-endian byte order. If this is
done, servers acting as replicas or immigrating file systems will be
able to parse values like stateids, directory cookies, filehandles,
etc., even if their native byte order is different from that of other
servers cooperating in the replication and migration of the file
system.
7.7.1. File System Transitions and Simultaneous Access
When a single file system may be accessed at multiple locations,
either because of an indication of file system identity as reported
by the fs_locations attribute, the client will, depending on specific
circumstances as discussed below, either:
o Access multiple instances simultaneously, each of which represents
an alternate path to the same data and metadata.
o Accesses one instance (or set of instances) and then transition to
an alternative instance (or set of instances) as a result of
network issues, server unresponsiveness, or server-directed
migration.
7.7.2. Filehandles and File System Transitions
There are a number of ways in which filehandles can be handled across
a file system transition. These can be divided into two broad
classes depending upon whether the two file systems across which the
transition happens share sufficient state to effect some sort of
continuity of file system handling.
When there is no such cooperation in filehandle assignment, the two
file systems are reported as being in different handle classes. In
this case, all filehandles are assumed to expire as part of the file
system transition. Note that this behavior does not depend on
fh_expire_type attribute and depends on the specification of the
FH4_VOL_MIGRATION bit.
When there is co-operation in filehandle assignment, the two file
systems are reported as being in the same handle classes. In this
case, persistent filehandles remain valid after the file system
transition, while volatile filehandles (excluding those that are only
volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration
on the target server.
7.7.3. Fileids and File System Transitions
The issue of continuity of fileids in the event of a file system
transition needs to be addressed. The general expectation is that in
situations in which the two file system instances are created by a
single vendor using some sort of file system image copy, fileids will
be consistent across the transition, while in the analogous multi-
vendor transitions they will not. This poses difficulties,
especially for the client without special knowledge of the transition
mechanisms adopted by the server. Note that although fileid is not a
REQUIRED attribute, many servers support fileids and many clients
provide APIs that depend on fileids.
It is important to note that while clients themselves may have no
trouble with a fileid changing as a result of a file system
transition event, applications do typically have access to the fileid
(e.g., via stat). The result is that an application may work
perfectly well if there is no file system instance transition or if
any such transition is among instances created by a single vendor,
yet be unable to deal with the situation in which a multi-vendor
transition occurs at the wrong time.
Providing the same fileids in a multi-vendor (multiple server
vendors) environment has generally been held to be quite difficult.
While there is work to be done, it needs to be pointed out that this
difficulty is partly self-imposed. Servers have typically identified
fileid with inode number, i.e., with a quantity used to find the file
in question. This identification poses special difficulties for
migration of a file system between vendors where assigning the same
index to a given file may not be possible. Note here that a fileid
is not required to be useful to find the file in question, only that
it is unique within the given file system. Servers prepared to
accept a fileid as a single piece of metadata and store it apart from
the value used to index the file information can relatively easily
maintain a fileid value across a migration event, allowing a truly
transparent migration event.
In any case, where servers can provide continuity of fileids, they
should, and the client should be able to find out that such
continuity is available and take appropriate action. Information
about the continuity (or lack thereof) of fileids across a file
system transition is represented by specifying whether the file
systems in question are of the same fileid class.
Note that when consistent fileids do not exist across a transition
(either because there is no continuity of fileids or because fileid
is not a supported attribute on one of instances involved), and there
are no reliable filehandles across a transition event (either because
there is no filehandle continuity or because the filehandles are
volatile), the client is in a position where it cannot verify that
files it was accessing before the transition are the same objects.
It is forced to assume that no object has been renamed, and, unless
there are guarantees that provide this (e.g., the file system is
read-only), problems for applications may occur. Therefore, use of
such configurations should be limited to situations where the
problems that this may cause can be tolerated.
7.7.4. Fsids and File System Transitions
Since fsids are generally only unique within a per-server basis, it
is likely that they will change during a file system transition.
Clients should not make the fsids received from the server visible to
applications since they may not be globally unique, and because they
may change during a file system transition event. Applications are
best served if they are isolated from such transitions to the extent
possible.
7.7.5. The Change Attribute and File System Transitions
Since the change attribute is defined as a server-specific one,
change attributes fetched from one server are normally presumed to be
invalid on another server. Such a presumption is troublesome since
it would invalidate all cached change attributes, requiring
refetching. Even more disruptive, the absence of any assured
continuity for the change attribute means that even if the same value
is retrieved on refetch, no conclusions can be drawn as to whether
the object in question has changed. The identical change attribute
could be merely an artifact of a modified file with a different
change attribute construction algorithm, with that new algorithm just
happening to result in an identical change value.
When the two file systems have consistent change attribute formats,
and we say that they are in the same change class, the client may
assume a continuity of change attribute construction and handle this
situation just as it would be handled without any file system
transition.
7.7.6. Lock State and File System Transitions
In a file system transition, the client needs to handle cases in
which the two servers have cooperated in state management and in
which they have not. Cooperation by two servers in state management
requires coordination of client IDs. Before the client attempts to
use a client ID associated with one server in a request to the server
of the other file system, it must eliminate the possibility that two
non-cooperating servers have assigned the same client ID by accident.
In the case of migration, the servers involved in the migration of a
file system SHOULD transfer all server state from the original to the
new server. When this is done, it must be done in a way that is
transparent to the client. With replication, such a degree of common
state is typically not the case.
This state transfer will reduce disruption to the client when a file
system transition occurs. If the servers are successful in
transferring all state, then the client may use the existing stateids
associated with that client ID for the old file system instance in
connection with that same client ID in connection with the
transitioned file system instance.
File systems cooperating in state management may actually share state
or simply divide the identifier space so as to recognize (and reject
as stale) each other's stateids and client IDs. Servers that do
share state may not do so under all conditions or at all times. If
the server cannot be sure when accepting a client ID that it reflects
the locks the client was given, the server must treat all associated
state as stale and report it as such to the client.
The client must establish a new client ID on the destination, if it
does not have one already, and reclaim locks if allowed by the
server. In this case, old stateids and client IDs should not be
presented to the new server since there is no assurance that they
will not conflict with IDs valid on that server.
When actual locks are not known to be maintained, the destination
server may establish a grace period specific to the given file
system, with non-reclaim locks being rejected for that file system,
even though normal locks are being granted for other file systems.
Clients should not infer the absence of a grace period for file
systems being transitioned to a server from responses to requests for
other file systems.
In the case of lock reclamation for a given file system after a file
system transition, edge conditions can arise similar to those for
reclaim after server restart (although in the case of the planned
state transfer associated with migration, these can be avoided by
securely recording lock state as part of state migration). Unless
the destination server can guarantee that locks will not be
incorrectly granted, the destination server should not allow lock
reclaims and should avoid establishing a grace period. (See
Section 9.14 for further details.)
Servers are encouraged to provide facilities to allow locks to be
reclaimed on the new server after a file system transition. Often
such facilities may not be available and client should be prepared to
re-obtain locks, even though it is possible that the client may have
its LOCK or OPEN request denied due to a conflicting lock.
The consequences of having no facilities available to reclaim locks
on the new server will depend on the type of environment. In some
environments, such as the transition between read-only file systems,
such denial of locks should not pose large difficulties in practice.
When an attempt to re-establish a lock on a new server is denied, the
client should treat the situation as if its original lock had been
revoked. Note that when the lock is granted, the client cannot
assume that no conflicting lock could have been granted in the
interim. Where change attribute continuity is present, the client
may check the change attribute to check for unwanted file
modifications. Where even this is not available, and the file system
is not read-only, a client may reasonably treat all pending locks as
having been revoked.
7.7.6.1. Transitions and the Lease_time Attribute
In order that the client may appropriately manage its lease in the
case of a file system transition, the destination server must
establish proper values for the lease_time attribute.
When state is transferred transparently, that state should include
the correct value of the lease_time attribute. The lease_time
attribute on the destination server must never be less than that on
the source, since this would result in premature expiration of a
lease granted by the source server. Upon transitions in which state
is transferred transparently, the client is under no obligation to
refetch the lease_time attribute and may continue to use the value
previously fetched (on the source server).
If state has not been transferred transparently because the client ID
is rejected when presented to the new server, the client should fetch
the value of lease_time on the new (i.e., destination) server, and
use it for subsequent locking requests. However, the server must
respect a grace period of at least as long as the lease_time on the
source server, in order to ensure that clients have ample time to
reclaim their lock before potentially conflicting non-reclaimed locks
are granted.
7.7.7. Write Verifiers and File System Transitions
In a file system transition, the two file systems may be clustered in
the handling of unstably written data. When this is the case, and
the two file systems belong to the same write-verifier class, write
verifiers returned from one system may be compared to those returned
by the other and superfluous writes avoided.
When two file systems belong to different write-verifier classes, any
verifier generated by one must not be compared to one provided by the
other. Instead, it should be treated as not equal even when the
values are identical.
7.7.8. Readdir Cookies and Verifiers and File System Transitions
In a file system transition, the two file systems may be consistent
in their handling of READDIR cookies and verifiers. When this is the
case, and the two file systems belong to the same readdir class,
READDIR cookies and verifiers from one system may be recognized by
the other and READDIR operations started on one server may be validly
continued on the other, simply by presenting the cookie and verifier
returned by a READDIR operation done on the first file system to the
second.
When two file systems belong to different readdir classes, any
READDIR cookie and verifier generated by one is not valid on the
second, and must not be presented to that server by the client. The
client should act as if the verifier was rejected.
7.7.9. File System Data and File System Transitions
When multiple replicas exist and are used simultaneously or in
succession by a client, applications using them will normally expect
that they contain either the same data or data that is consistent
with the normal sorts of changes that are made by other clients
updating the data of the file system (with metadata being the same to
the degree inferred by the fs_locations attribute). However, when
multiple file systems are presented as replicas of one another, the
precise relationship between the data of one and the data of another
is not, as a general matter, specified by the NFSv4 protocol. It is
quite possible to present as replicas file systems where the data of
those file systems is sufficiently different that some applications
have problems dealing with the transition between replicas. The
namespace will typically be constructed so that applications can
choose an appropriate level of support, so that in one position in
the namespace a varied set of replicas will be listed, while in
another only those that are up-to-date may be considered replicas.
The protocol does define four special cases of the relationship among
replicas to be specified by the server and relied upon by clients:
o When multiple server addresses correspond to the same actual
server, the client may depend on the fact that changes to data,
metadata, or locks made on one file system are immediately
reflected on others.
o When multiple replicas exist and are used simultaneously by a
client, they must designate the same data. Where file systems are
writable, a change made on one instance must be visible on all
instances, immediately upon the earlier of the return of the
modifying requester or the visibility of that change on any of the
associated replicas. This allows a client to use these replicas
simultaneously without any special adaptation to the fact that
there are multiple replicas. In this case, locks (whether share
reservations or byte-range locks), and delegations obtained on one
replica are immediately reflected on all replicas, even though
these locks will be managed under a set of client IDs.
o When one replica is designated as the successor instance to
another existing instance after return NFS4ERR_MOVED (i.e., the
case of migration), the client may depend on the fact that all
changes written to stable storage on the original instance are
written to stable storage of the successor (uncommitted writes are
dealt with in Section 7.7.7).
o Where a file system is not writable but represents a read-only
copy (possibly periodically updated) of a writable file system,
clients have similar requirements with regard to the propagation
of updates. They may need a guarantee that any change visible on
the original file system instance must be immediately visible on
any replica before the client transitions access to that replica,
in order to avoid any possibility that a client, in effecting a
transition to a replica, will see any reversion in file system
state. Since these file systems are presumed to be unsuitable for
simultaneous use, there is no specification of how locking is
handled; in general, locks obtained on one file system will be
separate from those on others. Since these are going to be read-
only file systems, this is not expected to pose an issue for
clients or applications.
7.8. Effecting File System Referrals
Referrals are effected when an absent file system is encountered, and Referrals are effected when an absent file system is encountered, and
one or more alternate locations are made available by the one or more alternate locations are made available by the
fs_locations attribute. The client will typically get an fs_locations attribute. The client will typically get an
NFS4ERR_MOVED error, fetch the appropriate location information, and NFS4ERR_MOVED error, fetch the appropriate location information, and
proceed to access the file system on a different server, even though proceed to access the file system on a different server, even though
it retains its logical position within the original namespace. it retains its logical position within the original namespace.
Referrals differ from migration events in that they happen only when Referrals differ from migration events in that they happen only when
the client has not previously referenced the file system in question the client has not previously referenced the file system in question
(so there is nothing to transition). Referrals can only come into (so there is nothing to transition). Referrals can only come into
effect when an absent file system is encountered at its root. effect when an absent file system is encountered at its root.
The examples given in the sections below are somewhat artificial in The examples given in the sections below are somewhat artificial in
that an actual client will not typically do a multi-component look that an actual client will not typically do a multi-component look
up, but will have cached information regarding the upper levels of up, but will have cached information regarding the upper levels of
the name hierarchy. However, these example are chosen to make the the name hierarchy. However, these example are chosen to make the
required behavior clear and easy to put within the scope of a small required behavior clear and easy to put within the scope of a small
number of requests, without getting unduly into details of how number of requests, without getting unduly into details of how
specific clients might choose to cache things. specific clients might choose to cache things.
7.8.1. Referral Example (LOOKUP) 8.7.1. Referral Example (LOOKUP)
Let us suppose that the following COMPOUND is sent in an environment Let us suppose that the following COMPOUND is sent in an environment
in which /this/is/the/path is absent from the target server. This in which /this/is/the/path is absent from the target server. This
may be for a number of reasons. It may be the case that the file may be for a number of reasons. It may be the case that the file
system has moved, or it may be the case that the target server is system has moved, or it may be the case that the target server is
functioning mainly, or solely, to refer clients to the servers on functioning mainly, or solely, to refer clients to the servers on
which various file systems are located. which various file systems are located.
o PUTROOTFH o PUTROOTFH
skipping to change at page 95, line 30 skipping to change at page 89, line 8
occurred (between "the" and "path"). The fs_locations attribute also occurred (between "the" and "path"). The fs_locations attribute also
gives the client the actual location of the absent file system, so gives the client the actual location of the absent file system, so
that the referral can proceed. The server gives the client the bare that the referral can proceed. The server gives the client the bare
minimum of information about the absent file system so that there minimum of information about the absent file system so that there
will be very little scope for problems of conflict between will be very little scope for problems of conflict between
information sent by the referring server and information of the file information sent by the referring server and information of the file
system's home. No filehandles and very few attributes are present on system's home. No filehandles and very few attributes are present on
the referring server, and the client can treat those it receives as the referring server, and the client can treat those it receives as
transient information with the function of enabling the referral. transient information with the function of enabling the referral.
7.8.2. Referral Example (READDIR) 8.7.2. Referral Example (READDIR)
Another context in which a client may encounter referrals is when it Another context in which a client may encounter referrals is when it
does a READDIR on a directory in which some of the sub-directories does a READDIR on a directory in which some of the sub-directories
are the roots of absent file systems. are the roots of absent file systems.
Suppose such a directory is read as follows: Suppose such a directory is read as follows:
o PUTROOTFH o PUTROOTFH
o LOOKUP "this" o LOOKUP "this"
skipping to change at page 97, line 6 skipping to change at page 90, line 36
o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and
is within the pseudo-fs. is within the pseudo-fs.
o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid) o READDIR (rdattr_error, fsid, size, time_modify, mounted_on_fileid)
--> NFS_OK. The attributes for directory entry with the component --> NFS_OK. The attributes for directory entry with the component
named "path" will only contain rdattr_error with the value named "path" will only contain rdattr_error with the value
NFS4ERR_MOVED, together with an fsid value and a value for NFS4ERR_MOVED, together with an fsid value and a value for
mounted_on_fileid. mounted_on_fileid.
So suppose we do another READDIR to get fs_locations (although we So suppose we do another READDIR to get fs_locations (although we
could have used a GETATTR directly, as in Section 7.8.1). could have used a GETATTR directly, as in Section 8.7.1).
o PUTROOTFH o PUTROOTFH
o LOOKUP "this" o LOOKUP "this"
o LOOKUP "is" o LOOKUP "is"
o LOOKUP "the" o LOOKUP "the"
o READDIR (rdattr_error, fs_locations, mounted_on_fileid, fsid, o READDIR (rdattr_error, fs_locations, mounted_on_fileid, fsid,
skipping to change at page 98, line 5 skipping to change at page 91, line 37
o mounted_on_fileid (value: unique fileid within referring file o mounted_on_fileid (value: unique fileid within referring file
system) system)
o fsid (value: unique value within referring server) o fsid (value: unique value within referring server)
The attributes for entry "path" will not contain size or time_modify The attributes for entry "path" will not contain size or time_modify
because these attributes are not available within an absent file because these attributes are not available within an absent file
system. system.
7.9. The Attribute fs_locations 8.8. The Attribute fs_locations
The fs_locations attribute is structured in the following way: The fs_locations attribute is structured in the following way:
struct fs_location4 { struct fs_location4 {
utf8val_REQUIRED4 server<>; utf8str_cis server<>;
pathname4 rootpath; pathname4 rootpath;
}; };
struct fs_locations4 { struct fs_locations4 {
pathname4 fs_root; pathname4 fs_root;
fs_location4 locations<>; fs_location4 locations<>;
}; };
The fs_location4 data type is used to represent the location of a The fs_location4 data type is used to represent the location of a
file system by providing a server name and the path to the root of file system by providing a server name and the path to the root of
the file system within that server's namespace. When a set of the file system within that server's namespace. When a set of
servers have corresponding file systems at the same path within their servers have corresponding file systems at the same path within their
namespaces, an array of server names may be provided. An entry in namespaces, an array of server names may be provided. An entry in
the server array is a UTF-8 string and represents one of a the server array is a UTF-8 string and represents one of a
traditional DNS host name, IPv4 address, IPv6 address, or an zero- traditional DNS host name, IPv4 address, IPv6 address, or an zero-
length string. A zero-length string SHOULD be used to indicate the length string. A zero-length string SHOULD be used to indicate the
current address being used for the RPC call. It is not a requirement current address being used for the RPC call. It is not a requirement
that all servers that share the same rootpath be listed in one that all servers that share the same rootpath be listed in one
skipping to change at page 99, line 49 skipping to change at page 93, line 33
glagoli, and one element in the locations array, with server equal to glagoli, and one element in the locations array, with server equal to
serv2, and rootpath equal to /izhitsa/fita. The client replaces /az/ serv2, and rootpath equal to /izhitsa/fita. The client replaces /az/
buky/vedi/glagoli with /izhitsa/fita, and uses the latter pathname on buky/vedi/glagoli with /izhitsa/fita, and uses the latter pathname on
serv2. serv2.
Thus, the server MUST return an fs_root that is equal to the path the Thus, the server MUST return an fs_root that is equal to the path the
client used to reach the object to which the fs_locations attribute client used to reach the object to which the fs_locations attribute
applies. Otherwise, the client cannot determine the new path to use applies. Otherwise, the client cannot determine the new path to use
on the new server. on the new server.
7.9.1. Inferring Transition Modes 8.8.1. Inferring Transition Modes
When fs_locations is used, information about the specific locations When fs_locations is used, information about the specific locations
should be assumed based on the following rules. should be assumed based on the following rules.
The following rules are general and apply irrespective of the The following rules are general and apply irrespective of the
context. context.
o All listed file system instances should be considered as of the o All listed file system instances should be considered as of the
same handle class if and only if the current fh_expire_type same handle class if and only if the current fh_expire_type
attribute does not include the FH4_VOL_MIGRATION bit. Note that attribute does not include the FH4_VOL_MIGRATION bit. Note that
skipping to change at page 101, line 5 skipping to change at page 94, line 34
NFS4ERR_MOVED error, the target should be treated as being of a NFS4ERR_MOVED error, the target should be treated as being of a
different write-verifier class from the source. different write-verifier class from the source.
The specific choices reflect typical implementation patterns for The specific choices reflect typical implementation patterns for
failover and controlled migration, respectively. failover and controlled migration, respectively.
See Section 17 for a discussion on the recommendations for the See Section 17 for a discussion on the recommendations for the
security flavor to be used by any GETATTR operation that requests the security flavor to be used by any GETATTR operation that requests the
"fs_locations" attribute. "fs_locations" attribute.
8. NFS Server Name Space
8.1. Server Exports
On a UNIX server the name space describes all the files reachable by
pathnames under the root directory or "/". On a Windows NT server
the name space constitutes all the files on disks named by mapped
disk letters. NFS server administrators rarely make the entire
server's filesystem name space available to NFS clients. More often
portions of the name space are made available via an "export"
feature. In previous versions of the NFS protocol, the root
filehandle for each export is obtained through the MOUNT protocol;
the client sends a string that identifies the export of name space
and the server returns the root filehandle for it. The MOUNT
protocol supports an EXPORTS procedure that will enumerate the
server's exports.
8.2. Browsing Exports
The NFSv4 protocol provides a root filehandle that clients can use to
obtain filehandles for these exports via a multi-component LOOKUP. A
common user experience is to use a graphical user interface (perhaps
a file "Open" dialog window) to find a file via progressive browsing
through a directory tree. The client must be able to move from one
export to another export via single-component, progressive LOOKUP
operations.
This style of browsing is not well supported by the NFSv2 and NFSv3
protocols. The client expects all LOOKUP operations to remain within
a single server filesystem. For example, the device attribute will
not change. This prevents a client from taking name space paths that
span exports.
An automounter on the client can obtain a snapshot of the server's
name space using the EXPORTS procedure of the MOUNT protocol. If it
understands the server's pathname syntax, it can create an image of
the server's name space on the client. The parts of the name space
that are not exported by the server are filled in with a "pseudo
filesystem" that allows the user to browse from one mounted
filesystem to another. There is a drawback to this representation of
the server's name space on the client: it is static. If the server
administrator adds a new export the client will be unaware of it.
8.3. Server Pseudo Filesystem
NFSv4 servers avoid this name space inconsistency by presenting all
the exports within the framework of a single server name space. An
NFSv4 client uses LOOKUP and READDIR operations to browse seamlessly
from one export to another. Portions of the server name space that
are not exported are bridged via a "pseudo filesystem" that provides
a view of exported directories only. A pseudo filesystem has a
unique fsid and behaves like a normal, read only filesystem.
Based on the construction of the server's name space, it is possible
that multiple pseudo filesystems may exist. For example,
/a pseudo filesystem
/a/b real filesystem
/a/b/c pseudo filesystem
/a/b/c/d real filesystem
Each of the pseudo filesystems are considered separate entities and
therefore will have a unique fsid.
8.4. Multiple Roots
The DOS and Windows operating environments are sometimes described as
having "multiple roots". Filesystems are commonly represented as
disk letters. MacOS represents filesystems as top level names.
NFSv4 servers for these platforms can construct a pseudo file system
above these root names so that disk letters or volume names are
simply directory names in the pseudo root.
8.5. Filehandle Volatility
The nature of the server's pseudo filesystem is that it is a logical
representation of filesystem(s) available from the server.
Therefore, the pseudo filesystem is most likely constructed
dynamically when the server is first instantiated. It is expected
that the pseudo filesystem may not have an on disk counterpart from
which persistent filehandles could be constructed. Even though it is
preferable that the server provide persistent filehandles for the
pseudo filesystem, the NFS client should expect that pseudo file
system filehandles are volatile. This can be confirmed by checking
the associated "fh_expire_type" attribute for those filehandles in
question. If the filehandles are volatile, the NFS client must be
prepared to recover a filehandle value (e.g., with a multi-component
LOOKUP) when receiving an error of NFS4ERR_FHEXPIRED.
8.6. Exported Root
If the server's root filesystem is exported, one might conclude that
a pseudo-filesystem is not needed. This would be wrong. Assume the
following filesystems on a server:
/ disk1 (exported)
/a disk2 (not exported)
/a/b disk3 (exported)
Because disk2 is not exported, disk3 cannot be reached with simple
LOOKUPs. The server must bridge the gap with a pseudo-filesystem.
8.7. Mount Point Crossing
The server filesystem environment may be constructed in such a way
that one filesystem contains a directory which is 'covered' or
mounted upon by a second filesystem. For example:
/a/b (filesystem 1)
/a/b/c/d (filesystem 2)
The pseudo filesystem for this server may be constructed to look
like:
/ (place holder/not exported)
/a/b (filesystem 1)
/a/b/c/d (filesystem 2)
It is the server's responsibility to present the pseudo filesystem
that is complete to the client. If the client sends a lookup request
for the path "/a/b/c/d", the server's response is the filehandle of
the filesystem "/a/b/c/d". In previous versions of the NFS protocol,
the server would respond with the filehandle of directory "/a/b/c/d"
within the filesystem "/a/b".
The NFS client will be able to determine if it crosses a server mount
point by a change in the value of the "fsid" attribute.
8.8. Security Policy and Name Space Presentation
The application of the server's security policy needs to be carefully
considered by the implementor. One may choose to limit the
viewability of portions of the pseudo filesystem based on the
server's perception of the client's ability to authenticate itself
properly. However, with the support of multiple security mechanisms
and the ability to negotiate the appropriate use of these mechanisms,
the server is unable to properly determine if a client will be able
to authenticate itself. If, based on its policies, the server
chooses to limit the contents of the pseudo filesystem, the server
may effectively hide filesystems from a client that may otherwise
have legitimate access.
As suggested practice, the server should apply the security policy of
a shared resource in the server's namespace to the components of the
resource's ancestors. For example:
/
/a/b
/a/b/c
The /a/b/c directory is a real filesystem and is the shared resource.
The security policy for /a/b/c is Kerberos with integrity. The
server should apply the same security policy to /, /a, and /a/b.
This allows for the extension of the protection of the server's
namespace to the ancestors of the real shared resource.
For the case of the use of multiple, disjoint security mechanisms in
the server's resources, the security for a particular object in the
server's namespace should be the union of all security mechanisms of
all direct descendants.
9. File Locking and Share Reservations 9. File Locking and Share Reservations
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of share reservations the protocol stateful. With the inclusion of share reservations the protocol
becomes substantially more dependent on state than the traditional becomes substantially more dependent on state than the traditional
combination of NFS and NLM (Network Lock Manager) [xnfs]. There are combination of NFS and NLM (Network Lock Manager) [xnfs]. There are
three components to making this state manageable: three components to making this state manageable:
o clear division between client and server o clear division between client and server
skipping to change at page 105, line 10 skipping to change at page 95, line 22
privileges to access a file in particular ways or at a particular privileges to access a file in particular ways or at a particular
location. location.
In all cases, there is a transition from the most general information In all cases, there is a transition from the most general information
that represents a client as a whole to the eventual lightweight that represents a client as a whole to the eventual lightweight
stateid used for most client and server locking interactions. The stateid used for most client and server locking interactions. The
details of this transition will vary with the type of object but it details of this transition will vary with the type of object but it
always starts with a client ID. always starts with a client ID.
To support Win32 share reservations it is necessary to atomically To support Win32 share reservations it is necessary to atomically
OPEN or CREATE files. Having a separate share/unshare operation OPEN or CREATE files and apply the appropriate locks in the same
would not allow correct implementation of the Win32 OpenFile API. In operation. Having a separate share/unshare operation would not allow
order to correctly implement share semantics, the previous NFS correct implementation of the Win32 OpenFile API. In order to
protocol mechanisms used when a file is opened or created (LOOKUP, correctly implement share semantics, the previous NFS protocol
CREATE, ACCESS) need to be replaced. The NFSv4 protocol has an OPEN mechanisms used when a file is opened or created (LOOKUP, CREATE,
ACCESS) need to be replaced. The NFSv4 protocol has an OPEN
operation that subsumes the NFSv3 methodology of LOOKUP, CREATE, and operation that subsumes the NFSv3 methodology of LOOKUP, CREATE, and
ACCESS. However, because many operations require a filehandle, the ACCESS. However, because many operations require a filehandle, the
traditional LOOKUP is preserved to map a file name to filehandle traditional LOOKUP is preserved to map a file name to filehandle
without establishing state on the server. The policy of granting without establishing state on the server. The policy of granting
access or modifying files is managed by the server based on the access or modifying files is managed by the server based on the
client's state. These mechanisms can implement policy ranging from client's state. These mechanisms can implement policy ranging from
advisory only locking to full mandatory locking. advisory only locking to full mandatory locking.
9.1. Opens and Byte-Range Locks 9.1. Opens and Byte-Range Locks
skipping to change at page 106, line 31 skipping to change at page 96, line 47
Client identification is encapsulated in the following structure: Client identification is encapsulated in the following structure:
struct nfs_client_id4 { struct nfs_client_id4 {
verifier4 verifier; verifier4 verifier;
opaque id<NFS4_OPAQUE_LIMIT>; opaque id<NFS4_OPAQUE_LIMIT>;
}; };
The first field, verifier is a client incarnation verifier that is The first field, verifier is a client incarnation verifier that is
used to detect client reboots. Only if the verifier is different used to detect client reboots. Only if the verifier is different
from that which the server has previously recorded the client (as from that which the server has previously recorded for the client (as
identified by the second field of the structure, id) does the server identified by the second field of the structure, id) does the server
start the process of canceling the client's leased state. start the process of canceling the client's leased state.
The second field, id is a variable length string that uniquely The second field, id is a variable length string that uniquely
defines the client. defines the client.
There are several considerations for how the client generates the id There are several considerations for how the client generates the id
string: string:
o The string should be unique so that multiple clients do not o The string should be unique so that multiple clients do not
skipping to change at page 114, line 31 skipping to change at page 104, line 45
o Otherwise, the stateid is valid and the table entry should contain o Otherwise, the stateid is valid and the table entry should contain
any additional information about the type of stateid and any additional information about the type of stateid and
information associated with that particular type of stateid, such information associated with that particular type of stateid, such
as the associated set of locks, such as open-owner and lock-owner as the associated set of locks, such as open-owner and lock-owner
information, as well as information on the specific locks, such as information, as well as information on the specific locks, such as
open modes and byte ranges. open modes and byte ranges.
9.1.3.5. Stateid Use for I/O Operations 9.1.3.5. Stateid Use for I/O Operations
Clients performing I/O operations need to select an appropriate Clients performing Input/Output (I/O) operations need to select an
stateid based on the locks (including opens and delegations) held by appropriate stateid based on the locks (including opens and
the client and the various types of state-owners sending the I/O delegations) held by the client and the various types of state-owners
requests. SETATTR operations that change the file size are treated sending the I/O requests. SETATTR operations that change the file
like I/O operations in this regard. size are treated like I/O operations in this regard.
The following rules, applied in order of decreasing priority, govern The following rules, applied in order of decreasing priority, govern
the selection of the appropriate stateid. In following these rules, the selection of the appropriate stateid. In following these rules,
the client will only consider locks of which it has actually received the client will only consider locks of which it has actually received
notification by an appropriate operation response or callback. notification by an appropriate operation response or callback.
o If the client holds a delegation for the file in question, the o If the client holds a delegation for the file in question, the
delegation stateid SHOULD be used. delegation stateid SHOULD be used.
o Otherwise, if the entity corresponding to the lock-owner (e.g., a o Otherwise, if the entity corresponding to the lock-owner (e.g., a
skipping to change at page 117, line 47 skipping to change at page 108, line 10
appropriate given the OPEN with which the operation is associated. appropriate given the OPEN with which the operation is associated.
In the case of WRITE-type operations (i.e., WRITEs and SETATTRs which In the case of WRITE-type operations (i.e., WRITEs and SETATTRs which
set size), the server must verify that the access mode allows writing set size), the server must verify that the access mode allows writing
and return an NFS4ERR_OPENMODE error if it does not. In the case, of and return an NFS4ERR_OPENMODE error if it does not. In the case, of
READ, the server may perform the corresponding check on the access READ, the server may perform the corresponding check on the access
mode, or it may choose to allow READ on opens for WRITE only, to mode, or it may choose to allow READ on opens for WRITE only, to
accommodate clients whose write implementation may unavoidably do accommodate clients whose write implementation may unavoidably do
reads (e.g., due to buffer cache constraints). However, even if reads (e.g., due to buffer cache constraints). However, even if
READs are allowed in these circumstances, the server MUST still check READs are allowed in these circumstances, the server MUST still check
for locks that conflict with the READ (e.g., another open specify for locks that conflict with the READ (e.g., another open specifying
denial of READs). Note that a server which does enforce the access denial of READs). Note that a server which does enforce the access
mode check on READs need not explicitly check for conflicting share mode check on READs need not explicitly check for conflicting share
reservations since the existence of OPEN for read access guarantees reservations since the existence of OPEN for read access guarantees
that no conflicting share reservation can exist. that no conflicting share reservation can exist.
A stateid of all bits 1 (one) MAY allow READ operations to bypass A stateid of all bits 1 (one) MAY allow READ operations to bypass
locking checks at the server. However, WRITE operations with a locking checks at the server. However, WRITE operations with a
stateid with bits all 1 (one) MUST NOT bypass locking checks and are stateid with bits all 1 (one) MUST NOT bypass locking checks and are
treated exactly the same as if a stateid of all bits 0 were used. treated exactly the same as if a stateid of all bits 0 were used.
skipping to change at page 118, line 22 skipping to change at page 108, line 33
request conflicts with the range of the READ or WRITE operation. For request conflicts with the range of the READ or WRITE operation. For
the purposes of this paragraph, a conflict occurs when a shared lock the purposes of this paragraph, a conflict occurs when a shared lock
is requested and a WRITE operation is being performed, or an is requested and a WRITE operation is being performed, or an
exclusive lock is requested and either a READ or a WRITE operation is exclusive lock is requested and either a READ or a WRITE operation is
being performed. A SETATTR that sets size is treated similarly to a being performed. A SETATTR that sets size is treated similarly to a
WRITE as discussed above. WRITE as discussed above.
9.1.6. Sequencing of Lock Requests 9.1.6. Sequencing of Lock Requests
Locking is different than most NFS operations as it requires "at- Locking is different than most NFS operations as it requires "at-
most-one" semantics that are not provided by ONCRPC. ONCRPC over a most-one" semantics that are not provided by ONC RPC. ONC RPC over a
reliable transport is not sufficient because a sequence of locking reliable transport is not sufficient because a sequence of locking
requests may span multiple TCP connections. In the face of requests may span multiple TCP connections. In the face of
retransmission or reordering, lock or unlock requests must have a retransmission or reordering, lock or unlock requests must have a
well defined and consistent behavior. To accomplish this, each lock well defined and consistent behavior. To accomplish this, each lock
request contains a sequence number that is a consecutively increasing request contains a sequence number that is a consecutively increasing
integer. Different state-owners have different sequences. The integer. Different state-owners have different sequences. The
server maintains the last sequence number (L) received and the server maintains the last sequence number (L) received and the
response that was returned. The server is free to assign any value response that was returned. The server SHOULD assign a seqid value
for the first request issued for any given state-owner. of one for the first request issued for any given state-owner.
Note that for requests that contain a sequence number, for each Note that for requests that contain a sequence number, for each
state-owner, there should be no more than one outstanding request. state-owner, there should be no more than one outstanding request.
If a request (r) with a previous sequence number (r < L) is received, If a request (r) with a previous sequence number (r < L) is received,
it is rejected with the return of error NFS4ERR_BAD_SEQID. Given a it is rejected with the return of error NFS4ERR_BAD_SEQID. Given a
properly-functioning client, the response to (r) must have been properly-functioning client, the response to (r) must have been
received before the last request (L) was sent. If a duplicate of received before the last request (L) was sent. If a duplicate of
last request (r == L) is received, the stored response is returned. last request (r == L) is received, the stored response is returned.
If a request beyond the next sequence (r == L + 2) is received, it is If a request beyond the next sequence (r == L + 2) is received, it is
rejected with the return of error NFS4ERR_BAD_SEQID. Sequence rejected with the return of error NFS4ERR_BAD_SEQID. Sequence
history is reinitialized whenever the SETCLIENTID/SETCLIENTID_CONFIRM history is reinitialized whenever the SETCLIENTID/SETCLIENTID_CONFIRM
sequence changes the client verifier. sequence changes the client verifier.
Since the sequence number is represented with an unsigned 32-bit Since the sequence number is represented with an unsigned 32-bit
integer, the arithmetic involved with the sequence number is mod integer, the arithmetic involved with the sequence number is mod
2^32. For an example of modulo arithmetic involving sequence numbers 2^32. Note that when the seqid wraps, it SHOULD bypass zero and use
see [RFC0793]. one as the next seqid value. For an example of modulo arithmetic
involving sequence numbers see [RFC0793].
It is critical the server maintain the last response sent to the It is critical the server maintain the last response sent to the
client to provide a more reliable cache of duplicate non-idempotent client to provide a more reliable cache of duplicate non-idempotent
requests than that of the traditional cache described in [Chet]. The requests than that of the traditional cache described in [Chet]. The
traditional duplicate request cache uses a least recently used traditional duplicate request cache uses a least recently used
algorithm for removing unneeded requests. However, the last lock algorithm for removing unneeded requests. However, the last lock
request and response on a given state-owner must be cached as long as request and response on a given state-owner must be cached as long as
the lock state exists on the server. the lock state exists on the server.
The client MUST monotonically increment the sequence number for the The client MUST monotonically increment the sequence number for the
skipping to change at page 122, line 25 skipping to change at page 112, line 39
NFS4ERR_EXPIRED). A server which supports delegations can be sure NFS4ERR_EXPIRED). A server which supports delegations can be sure
that no open-owners for that client have been recycled since client that no open-owners for that client have been recycled since client
initialization or deletion of lease state and thus can ensure that initialization or deletion of lease state and thus can ensure that
confirmation will not be required. confirmation will not be required.
9.2. Lock Ranges 9.2. Lock Ranges
The protocol allows a lock owner to request a lock with a byte range The protocol allows a lock owner to request a lock with a byte range
and then either upgrade or unlock a sub-range of the initial lock. and then either upgrade or unlock a sub-range of the initial lock.
It is expected that this will be an uncommon type of request. In any It is expected that this will be an uncommon type of request. In any
case, servers or server filesystems may not be able to support sub- case, servers or server file systems may not be able to support sub-
range lock semantics. In the event that a server receives a locking range lock semantics. In the event that a server receives a locking
request that represents a sub-range of current locking state for the request that represents a sub-range of current locking state for the
lock owner, the server is allowed to return the error lock owner, the server is allowed to return the error
NFS4ERR_LOCK_RANGE to signify that it does not support sub-range lock NFS4ERR_LOCK_RANGE to signify that it does not support sub-range lock
operations. Therefore, the client should be prepared to receive this operations. Therefore, the client should be prepared to receive this
error and, if appropriate, report the error to the requesting error and, if appropriate, report the error to the requesting
application. application.
The client is discouraged from combining multiple independent locking The client is discouraged from combining multiple independent locking
ranges that happen to be adjacent into a single request since the ranges that happen to be adjacent into a single request since the
skipping to change at page 123, line 32 skipping to change at page 113, line 44
granted. Clients have no choice but to continually poll for the granted. Clients have no choice but to continually poll for the
lock. This presents a fairness problem. Two new lock types are lock. This presents a fairness problem. Two new lock types are
added, READW and WRITEW, and are used to indicate to the server that added, READW and WRITEW, and are used to indicate to the server that
the client is requesting a blocking lock. The server should maintain the client is requesting a blocking lock. The server should maintain
an ordered list of pending blocking locks. When the conflicting lock an ordered list of pending blocking locks. When the conflicting lock
is released, the server may wait the lease period for the first is released, the server may wait the lease period for the first
waiting client to re-request the lock. After the lease period waiting client to re-request the lock. After the lease period
expires the next waiting client request is allowed the lock. Clients expires the next waiting client request is allowed the lock. Clients
are required to poll at an interval sufficiently small that it is are required to poll at an interval sufficiently small that it is
likely to acquire the lock in a timely manner. The server is not likely to acquire the lock in a timely manner. The server is not
required to maintain a list of pending blocked locks as it is used to required to maintain a list of pending blocked locks as it is not
increase fairness and not correct operation. Because of the used to provide correct operation but only to increase fairness.
unordered nature of crash recovery, storing of lock state to stable Because of the unordered nature of crash recovery, storing of lock
storage would be required to guarantee ordered granting of blocking state to stable storage would be required to guarantee ordered
locks. granting of blocking locks.
Servers may also note the lock types and delay returning denial of Servers may also note the lock types and delay returning denial of
the request to allow extra time for a conflicting lock to be the request to allow extra time for a conflicting lock to be
released, allowing a successful return. In this way, clients can released, allowing a successful return. In this way, clients can
avoid the burden of needlessly frequent polling for blocking locks. avoid the burden of needlessly frequent polling for blocking locks.
The server should take care in the length of delay in the event the The server should take care in the length of delay in the event the
client retransmits the request. client retransmits the request.
If a server receives a blocking lock request, denies it, and then If a server receives a blocking lock request, denies it, and then
later receives a nonblocking request for the same lock, which is also later receives a nonblocking request for the same lock, which is also
skipping to change at page 136, line 25 skipping to change at page 126, line 33
have occurred on the server and thus determine if it is possible that have occurred on the server and thus determine if it is possible that
a lease period expiration could have occurred. a lease period expiration could have occurred.
The third lock revocation event can occur as a result of The third lock revocation event can occur as a result of
administrative intervention within the lease period. While this is administrative intervention within the lease period. While this is
considered a rare event, it is possible that the server's considered a rare event, it is possible that the server's
administrator has decided to release or revoke a particular lock held administrator has decided to release or revoke a particular lock held
by the client. As a result of revocation, the client will receive an by the client. As a result of revocation, the client will receive an
error of NFS4ERR_ADMIN_REVOKED. In this instance the client may error of NFS4ERR_ADMIN_REVOKED. In this instance the client may
assume that only the state-owner's locks have been lost. The client assume that only the state-owner's locks have been lost. The client
notifies the lock holder appropriately. The client may not assume notifies the lock holder appropriately. The client MUST NOT assume
the lease period has been renewed as a result of a failed operation. the lease period has been renewed as a result of a failed operation.
When the client determines the lease period may have expired, the When the client determines the lease period may have expired, the
client must mark all locks held for the associated lease as client must mark all locks held for the associated lease as
"unvalidated". This means the client has been unable to re-establish "unvalidated". This means the client has been unable to re-establish
or confirm the appropriate lock state with the server. As described or confirm the appropriate lock state with the server. As described
in Section 9.6, there are scenarios in which the server may grant in Section 9.6, there are scenarios in which the server may grant
conflicting locks after the lease period has expired for a client. conflicting locks after the lease period has expired for a client.
When it is possible that the lease period has expired, the client When it is possible that the lease period has expired, the client
must validate each lock currently held to ensure that a conflicting must validate each lock currently held to ensure that a conflicting
skipping to change at page 140, line 47 skipping to change at page 131, line 16
When responsibility for handling a given file system is transferred When responsibility for handling a given file system is transferred
to a new server (migration) or the client chooses to use an alternate to a new server (migration) or the client chooses to use an alternate
server (e.g., in response to server unresponsiveness) in the context server (e.g., in response to server unresponsiveness) in the context
of file system replication, the appropriate handling of state shared of file system replication, the appropriate handling of state shared
between the client and server (i.e., locks, leases, stateids, and between the client and server (i.e., locks, leases, stateids, and
client IDs) is as described below. The handling differs between client IDs) is as described below. The handling differs between
migration and replication. For related discussion of file server migration and replication. For related discussion of file server
state and recover of such see the sections under Section 9.6. state and recover of such see the sections under Section 9.6.
If a server replica or a server immigrating a filesystem agrees to, If a server replica or a server immigrating a file system agrees to,
or is expected to, accept opaque values from the client that or is expected to, accept opaque values from the client that
originated from another server, then it is a wise implementation originated from another server, then servers SHOULD encode the
practice for the servers to encode the "opaque" values in network "opaque" values in network byte order. This way, servers acting as
byte order. This way, servers acting as replicas or immigrating replicas or immigrating file systems will be able to parse values
filesystems will be able to parse values like stateids, directory like stateids, directory cookies, filehandles, etc. even if their
cookies, filehandles, etc. even if their native byte order is native byte order is different from other servers cooperating in the
different from other servers cooperating in the replication and replication and migration of the file system.
migration of the filesystem.
9.14.1. Migration and State 9.14.1. Migration and State
In the case of migration, the servers involved in the migration of a In the case of migration, the servers involved in the migration of a
filesystem SHOULD transfer all server state from the original to the file system SHOULD transfer all server state from the original to the
new server. This must be done in a way that is transparent to the new server. This must be done in a way that is transparent to the
client. This state transfer will ease the client's transition when a client. This state transfer will ease the client's transition when a
filesystem migration occurs. If the servers are successful in file system migration occurs. If the servers are successful in
transferring all state, the client will continue to use stateids transferring all state, the client will continue to use stateids
assigned by the original server. Therefore the new server must assigned by the original server. Therefore the new server must
recognize these stateids as valid. This holds true for the client ID recognize these stateids as valid. This holds true for the client ID
as well. Since responsibility for an entire filesystem is as well. Since responsibility for an entire file system is
transferred with a migration event, there is no possibility that transferred with a migration event, there is no possibility that
conflicts will arise on the new server as a result of the transfer of conflicts will arise on the new server as a result of the transfer of
locks. locks.
As part of the transfer of information between servers, leases would As part of the transfer of information between servers, leases would
be transferred as well. The leases being transferred to the new be transferred as well. The leases being transferred to the new
server will typically have a different expiration time from those for server will typically have a different expiration time from those for
the same client, previously on the old server. To maintain the the same client, previously on the old server. To maintain the
property that all leases on a given server for a given client expire property that all leases on a given server for a given client expire
at the same time, the server should advance the expiration time to at the same time, the server should advance the expiration time to
skipping to change at page 142, line 18 skipping to change at page 132, line 30
server control, the handling of state is different. In this case, server control, the handling of state is different. In this case,
leases, stateids and client IDs do not have validity across a leases, stateids and client IDs do not have validity across a
transition from one server to another. The client must re-establish transition from one server to another. The client must re-establish
its locks on the new server. This can be compared to the re- its locks on the new server. This can be compared to the re-
establishment of locks by means of reclaim-type requests after a establishment of locks by means of reclaim-type requests after a
server reboot. The difference is that the server has no provision to server reboot. The difference is that the server has no provision to
distinguish requests reclaiming locks from those obtaining new locks distinguish requests reclaiming locks from those obtaining new locks
or to defer the latter. Thus, a client re-establishing a lock on the or to defer the latter. Thus, a client re-establishing a lock on the
new server (by means of a LOCK or OPEN request), may have the new server (by means of a LOCK or OPEN request), may have the
requests denied due to a conflicting lock. Since replication is requests denied due to a conflicting lock. Since replication is
intended for read-only use of filesystems, such denial of locks intended for read-only use of file systems, such denial of locks
should not pose large difficulties in practice. When an attempt to should not pose large difficulties in practice. When an attempt to
re-establish a lock on a new server is denied, the client should re-establish a lock on a new server is denied, the client should
treat the situation as if his original lock had been revoked. treat the situation as if his original lock had been revoked.
9.14.3. Notification of Migrated Lease 9.14.3. Notification of Migrated Lease
In the case of lease renewal, the client may not be submitting In the case of lease renewal, the client may not be submitting
requests for a filesystem that has been migrated to another server. requests for a file system that has been migrated to another server.
This can occur because of the implicit lease renewal mechanism. The This can occur because of the implicit lease renewal mechanism. The
client renews leases for all filesystems when submitting a request to client renews leases for all file systems when submitting a request
any one filesystem at the server. to any one file system at the server.
In order for the client to schedule renewal of leases that may have In order for the client to schedule renewal of leases that may have
been relocated to the new server, the client must find out about been relocated to the new server, the client must find out about
lease relocation before those leases expire. To accomplish this, all lease relocation before those leases expire. To accomplish this, all
operations which implicitly renew leases for a client (such as OPEN, operations which implicitly renew leases for a client (such as OPEN,
CLOSE, READ, WRITE, RENEW, LOCK, and others), will return the error CLOSE, READ, WRITE, RENEW, LOCK, and others), will return the error
NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be
renewed has been transferred to a new server. This condition will renewed has been transferred to a new server. This condition will
continue until the client receives an NFS4ERR_MOVED error and the continue until the client receives an NFS4ERR_MOVED error and the
server receives the subsequent GETATTR(fs_locations) for an access to server receives the subsequent GETATTR(fs_locations) for an access to
each filesystem for which a lease has been moved to a new server. By each file system for which a lease has been moved to a new server.
convention, the compound including the GETATTR(fs_locations) SHOULD By convention, the compound including the GETATTR(fs_locations)
append a RENEW operation to permit the server to identify the client SHOULD append a RENEW operation to permit the server to identify the
doing the access. client doing the access.
Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports
filesystem migration MUST probe all filesystems from that server on file system migration MUST probe all file systems from that server on
which it holds open state. Once the client has successfully probed which it holds open state. Once the client has successfully probed
all those filesystems which are migrated, the server MUST resume all those file systems which are migrated, the server MUST resume
normal handling of stateful requests from that client. normal handling of stateful requests from that client.
In order to support legacy clients that do not handle the In order to support legacy clients that do not handle the
NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after
a wait of at least two lease periods, at which time it will resume a wait of at least two lease periods, at which time it will resume
normal handling of stateful requests from all clients. If a client normal handling of stateful requests from all clients. If a client
attempts to access the migrated files, the server MUST reply attempts to access the migrated files, the server MUST reply
NFS4ERR_MOVED. NFS4ERR_MOVED.
When the client receives an NFS4ERR_MOVED error, the client can When the client receives an NFS4ERR_MOVED error, the client can
skipping to change at page 146, line 16 skipping to change at page 136, line 29
is an associated lease that is subject to renewal together with all is an associated lease that is subject to renewal together with all
of the other leases held by that client. of the other leases held by that client.
Unlike locks, an operation by a second client to a delegated file Unlike locks, an operation by a second client to a delegated file
will cause the server to recall a delegation through a callback. will cause the server to recall a delegation through a callback.
On recall, the client holding the delegation must flush modified On recall, the client holding the delegation must flush modified
state (such as modified data) to the server and return the state (such as modified data) to the server and return the
delegation. The conflicting request will not be acted on until the delegation. The conflicting request will not be acted on until the
recall is complete. The recall is considered complete when the recall is complete. The recall is considered complete when the
client returns the delegation or the server times its wait for the client returns the delegation or the server times out its wait for
delegation to be returned and revokes the delegation as a result of the delegation to be returned and revokes the delegation as a result
the timeout. In the interim, the server will either delay responding of the timeout. In the interim, the server will either delay
to conflicting requests or respond to them with NFS4ERR_DELAY. responding to conflicting requests or respond to them with
Following the resolution of the recall, the server has the NFS4ERR_DELAY. Following the resolution of the recall, the server
information necessary to grant or deny the second client's request. has the information necessary to grant or deny the second client's
request.
At the time the client receives a delegation recall, it may have At the time the client receives a delegation recall, it may have
substantial state that needs to be flushed to the server. Therefore, substantial state that needs to be flushed to the server. Therefore,
the server should allow sufficient time for the delegation to be the server should allow sufficient time for the delegation to be
returned since it may involve numerous RPCs to the server. If the returned since it may involve numerous RPCs to the server. If the
server is able to determine that the client is diligently flushing server is able to determine that the client is diligently flushing
state to the server as a result of the recall, the server may extend state to the server as a result of the recall, the server MAY extend
the usual time allowed for a recall. However, the time allowed for the usual time allowed for a recall. However, the time allowed for
recall completion should not be unbounded. recall completion SHOULD NOT be unbounded.
An example of this is when responsibility to mediate opens on a given An example of this is when responsibility to mediate opens on a given
file is delegated to a client (see Section 10.4). The server will file is delegated to a client (see Section 10.4). The server will
not know what opens are in effect on the client. Without this not know what opens are in effect on the client. Without this
knowledge the server will be unable to determine if the access and knowledge the server will be unable to determine if the access and
deny state for the file allows any particular open until the deny state for the file allows any particular open until the
delegation for the file has been returned. delegation for the file has been returned.
A client failure or a network partition can result in failure to A client failure or a network partition can result in failure to
respond to a recall callback. In this case, the server will revoke respond to a recall callback. In this case, the server will revoke
skipping to change at page 147, line 11 skipping to change at page 137, line 26
delegation request results in a revoke. The client could then delegation request results in a revoke. The client could then
determine which delegations it may not need and preemptively release determine which delegations it may not need and preemptively release
them. them.
10.2.1. Delegation Recovery 10.2.1. Delegation Recovery
There are three situations that delegation recovery must deal with: There are three situations that delegation recovery must deal with:
o Client reboot or restart o Client reboot or restart
o Server reboot or restart o Server reboot or restart (see Section 9.6.3.1)
o Network partition (full or callback-only) o Network partition (full or callback-only)
In the event the client reboots or restarts, the confirmation of a In the event the client reboots or restarts, the confirmation of a
SETCLIENTID done with an nfs_client_id4 with a new verifier4 value SETCLIENTID done with an nfs_client_id4 with a new verifier4 value
will result in the release of byte-range locks and share will result in the release of byte-range locks and share
reservations. Delegations, however, may be treated a bit reservations. Delegations, however, may be treated a bit
differently. differently.
There will be situations in which delegations will need to be There will be situations in which delegations will need to be
skipping to change at page 148, line 45 skipping to change at page 139, line 11
Support for a claim type of CLAIM_DELEGATE_PREV, is often referred to Support for a claim type of CLAIM_DELEGATE_PREV, is often referred to
as providing for "client-persistent delegations" in that they allow as providing for "client-persistent delegations" in that they allow
use of client persistent storage on the client to store data written use of client persistent storage on the client to store data written
by the client, even across a client restart. It should be noted by the client, even across a client restart. It should be noted
that, with the optional exception noted below, this feature requires that, with the optional exception noted below, this feature requires
persistent storage to be used on the client and does not add to persistent storage to be used on the client and does not add to
persistent storage requirements on the server. persistent storage requirements on the server.
One good way to think about client-persistent delegations is that for One good way to think about client-persistent delegations is that for
the most part, they function like "courtesy locks", with a special the most part, they function like "courtesy locks", with special
semantic adjustments to allow them to be retained across a client semantic adjustments to allow them to be retained across a client
restart, which cause all other sorts of locks to be freed. Such restart, which cause all other sorts of locks to be freed. Such
locks are generally not retained across a server restart. The one locks are generally not retained across a server restart. The one
exception is the case of simultaneous failure of the client and exception is the case of simultaneous failure of the client and
server and is discussed below. server and is discussed below.
When the server indicates support of CLAIM_DELEGATE_PREV (implicitly) When the server indicates support of CLAIM_DELEGATE_PREV (implicitly)
by returning NFS_OK to DELEGPURGE, a client with a write delegation, by returning NFS_OK to DELEGPURGE, a client with a write delegation,
can use write-back caching for data to be written to the server, can use write-back caching for data to be written to the server,
deferring the write-back, until such time as the delegation is deferring the write-back, until such time as the delegation is
skipping to change at page 149, line 27 skipping to change at page 139, line 42
semantic difference. In the normal case, if the server decides that semantic difference. In the normal case, if the server decides that
a delegation should not be granted, it performs the requested action a delegation should not be granted, it performs the requested action
(e.g., OPEN) without granting any delegation. For reclaim, the (e.g., OPEN) without granting any delegation. For reclaim, the
server grants the delegation but a special designation is applied so server grants the delegation but a special designation is applied so
that the client treats the delegation as having been granted but that the client treats the delegation as having been granted but
recalled by the server. Because of this, the client has the duty to recalled by the server. Because of this, the client has the duty to
write all modified state to the server and then return the write all modified state to the server and then return the
delegation. This process of handling delegation reclaim reconciles delegation. This process of handling delegation reclaim reconciles
three principles of the NFSv4 protocol: three principles of the NFSv4 protocol:
o Upon reclaim, a client reporting resources assigned to it by an o Upon reclaim, a client claiming resources assigned to it by an
earlier server instance must be granted those resources. earlier server instance must be granted those resources.
o The server has unquestionable authority to determine whether o The server has unquestionable authority to determine whether
delegations are to be granted and, once granted, whether they are delegations are to be granted and, once granted, whether they are
to be continued. to be continued.
o The use of callbacks is not to be depended upon until the client o The use of callbacks is not to be depended upon until the client
has proven its ability to receive them. has proven its ability to receive them.
When a client has more than a single open associated with a When a client has more than a single open associated with a
delegation, state for those additional opens can be established using delegation, state for those additional opens can be established using
OPEN operations of type CLAIM_DELEGATE_CUR. When these are used to OPEN operations of type CLAIM_DELEGATE_CUR. When these are used to
establish opens associated with reclaimed delegations, the server establish opens associated with reclaimed delegations, the server
MUST allow them when made within the grace period. MUST allow them when made within the grace period.
Situations in which there us a series of client and server restarts Situations in which there is a series of client and server restarts
where there is no restart of both at the same time, are dealt with where there is no restart of both at the same time, are dealt with
via a combination of CLAIM_DELEGATE_PREV and CLAIM_PREVIOUS reclaim via a combination of CLAIM_DELEGATE_PREV and CLAIM_PREVIOUS reclaim
cycles. Persistent storage is needed only on the client. For each cycles. Persistent storage is needed only on the client. For each
server failure, a CLAIM_PREVIOUS reclaim cycle is done, while for server failure, a CLAIM_PREVIOUS reclaim cycle is done, while for
each client restart, a CLAIM_DELEGATE_PREV reclaim cycle is done. each client restart, a CLAIM_DELEGATE_PREV reclaim cycle is done.
To deal with the possibility of simultaneous failure of client and To deal with the possibility of simultaneous failure of client and
server (e.g., a data center power outage), the server MAY server (e.g., a data center power outage), the server MAY
persistently store delegation information so that it can respond to a persistently store delegation information so that it can respond to a
CLAIM_DELEGATE_PREV reclaim request which it receives from a CLAIM_DELEGATE_PREV reclaim request which it receives from a
skipping to change at page 151, line 9 skipping to change at page 141, line 24
delegation but before the client holding the revoked delegation is delegation but before the client holding the revoked delegation is
notified about the revocation. notified about the revocation.
Note that when there is a loss of a delegation, due to a network Note that when there is a loss of a delegation, due to a network
partition in which all locks associated with the lease are lost, the partition in which all locks associated with the lease are lost, the
client will also receive the error NFS4ERR_EXPIRED. This case can be client will also receive the error NFS4ERR_EXPIRED. This case can be
distinguished from other situations in which delegations are revoked distinguished from other situations in which delegations are revoked
by seeing that the associated clientid becomes invalid so that by seeing that the associated clientid becomes invalid so that
NFS4ERR_STALE_CLIENTID is returned when it is used. NFS4ERR_STALE_CLIENTID is returned when it is used.
When NFS4ERR_EXPIRED Is returned, the server MAY retain information When NFS4ERR_EXPIRED is returned, the server MAY retain information
about the delegations held by the client, deleting those that are about the delegations held by the client, deleting those that are
invalidated by a conflicting request. Retaining such information invalidated by a conflicting request. Retaining such information
will allow the client to recover all non-invalidated delegations will allow the client to recover all non-invalidated delegations
using the claim type CLAIM_DELEGATE_PREV, once the using the claim type CLAIM_DELEGATE_PREV, once the
SETCLIENTID_CONFIRM is done to recover. Attempted recovery of a SETCLIENTID_CONFIRM is done to recover. Attempted recovery of a
delegation that the client has no record of, typically because they delegation that the client has no record of, typically because they
were invalidated by conflicting requests, will get the error were invalidated by conflicting requests, will get the error
NFS4ERR_BAD_RECLAIM. Once a reclaim is attempted for all delegations NFS4ERR_BAD_RECLAIM. Once a reclaim is attempted for all delegations
that the client held, it SHOULD do a DELEGPURGE to allow any that the client held, it SHOULD do a DELEGPURGE to allow any
remaining server delegation information to be freed. remaining server delegation information to be freed.
skipping to change at page 152, line 15 skipping to change at page 142, line 32
the data for the OPENed file is still correctly reflected in the the data for the OPENed file is still correctly reflected in the
client's cache. This validation must be done at least when the client's cache. This validation must be done at least when the
client's OPEN operation includes DENY=WRITE or BOTH thus client's OPEN operation includes DENY=WRITE or BOTH thus
terminating a period in which other clients may have had the terminating a period in which other clients may have had the
opportunity to open the file with WRITE access. Clients may opportunity to open the file with WRITE access. Clients may
choose to do the revalidation more often (i.e., at OPENs choose to do the revalidation more often (i.e., at OPENs
specifying DENY=NONE) to parallel the NFSv3 protocol's practice specifying DENY=NONE) to parallel the NFSv3 protocol's practice
for the benefit of users assuming this degree of cache for the benefit of users assuming this degree of cache
revalidation. Since the change attribute is updated for data and revalidation. Since the change attribute is updated for data and
metadata modifications, some client implementors may be tempted to metadata modifications, some client implementors may be tempted to
use the time_modify attribute and not change to validate cached use the time_modify attribute and not the change attribute to
data, so that metadata changes do not spuriously invalidate clean validate cached data, so that metadata changes do not spuriously
data. The implementor is cautioned in this approach. The change invalidate clean data. The implementor is cautioned in this
attribute is guaranteed to change for each update to the file, approach. The change attribute is guaranteed to change for each
whereas time_modify is guaranteed to change only at the update to the file, whereas time_modify is guaranteed to change
granularity of the time_delta attribute. Use by the client's data only at the granularity of the time_delta attribute. Use by the
cache validation logic of time_modify and not change runs the risk client's data cache validation logic of time_modify and not the
of the client incorrectly marking stale data as valid. change attribute runs the risk of the client incorrectly marking
stale data as valid.
o Second, modified data must be flushed to the server before closing o Second, modified data must be flushed to the server before closing
a file OPENed for write. This is complementary to the first rule. a file OPENed for write. This is complementary to the first rule.
If the data is not flushed at CLOSE, the revalidation done after If the data is not flushed at CLOSE, the revalidation done after
client OPENs as file is unable to achieve its purpose. The other the client OPENs a file is unable to achieve its purpose. The
aspect to flushing the data before close is that the data must be other aspect to flushing the data before close is that the data
committed to stable storage, at the server, before the CLOSE must be committed to stable storage, at the server, before the
operation is requested by the client. In the case of a server CLOSE operation is requested by the client. In the case of a
reboot or restart and a CLOSEd file, it may not be possible to server reboot or restart and a CLOSEd file, it may not be possible
retransmit the data to be written to the file. Hence, this to retransmit the data to be written to the file. Hence, this
requirement. requirement.
10.3.2. Data Caching and File Locking 10.3.2. Data Caching and File Locking
For those applications that choose to use file locking instead of For those applications that choose to use file locking instead of
share reservations to exclude inconsistent file access, there is an share reservations to exclude inconsistent file access, there is an
analogous set of constraints that apply to client side data caching. analogous set of constraints that apply to client side data caching.
These rules are effective only if the file locking is used in a way These rules are effective only if the file locking is used in a way
that matches in an equivalent way the actual READ and WRITE that matches in an equivalent way the actual READ and WRITE
operations executed. This is as opposed to file locking that is operations executed. This is as opposed to file locking that is
based on pure convention. For example, it is possible to manipulate based on pure convention. For example, it is possible to manipulate
a two-megabyte file by dividing the file into two one-megabyte a two-megabyte file by dividing the file into two one-megabyte
regions and protecting access to the two regions by file locks on regions and protecting access to the two regions by file locks on
bytes zero and one. A lock for write on byte zero of the file would bytes zero and one. A lock for write on byte zero of the file would
represent the right to do READ and WRITE operations on the first represent the right to do READ and WRITE operations on the first
region. A lock for write on byte one of the file would represent the region. A lock for write on byte one of the file would represent the
right to do READ and WRITE operations on the second region. As long right to do READ and WRITE operations on the second region. As long
as all applications manipulating the file obey this convention, they as all applications manipulating the file obey this convention, they
will work on a local filesystem. However, they may not work with the will work on a local file system. However, they may not work with
NFSv4 protocol unless clients refrain from data caching. the NFSv4 protocol unless clients refrain from data caching.
The rules for data caching in the file locking environment are: The rules for data caching in the file locking environment are:
o First, when a client obtains a file lock for a particular region, o First, when a client obtains a file lock for a particular region,
the data cache corresponding to that region (if any cached data the data cache corresponding to that region (if any cached data
exists) must be revalidated. If the change attribute indicates exists) must be revalidated. If the change attribute indicates
that the file may have been updated since the cached data was that the file may have been updated since the cached data was
obtained, the client must flush or invalidate the cached data for obtained, the client must flush or invalidate the cached data for
the newly locked region. A client might choose to invalidate all the newly locked region. A client might choose to invalidate all
of non-modified cached data that it has for the file but the only of non-modified cached data that it has for the file but the only
skipping to change at page 154, line 37 skipping to change at page 145, line 8
appropriate file lock is not held for the range of the read or write, appropriate file lock is not held for the range of the read or write,
the read or write request must not be satisfied by the client's cache the read or write request must not be satisfied by the client's cache
and the request must be sent to the server for processing. When a and the request must be sent to the server for processing. When a
read or write request partially overlaps a locked region, the request read or write request partially overlaps a locked region, the request
should be subdivided into multiple pieces with each region (locked or should be subdivided into multiple pieces with each region (locked or
not) treated appropriately. not) treated appropriately.
10.3.4. Data Caching and File Identity 10.3.4. Data Caching and File Identity
When clients cache data, the file data needs to be organized When clients cache data, the file data needs to be organized
according to the filesystem object to which the data belongs. For according to the file system object to which the data belongs. For
NFSv3 clients, the typical practice has been to assume for the NFSv3 clients, the typical practice has been to assume for the
purpose of caching that distinct filehandles represent distinct purpose of caching that distinct filehandles represent distinct file
filesystem objects. The client then has the choice to organize and system objects. The client then has the choice to organize and
maintain the data cache on this basis. maintain the data cache on this basis.
In the NFSv4 protocol, there is now the possibility to have In the NFSv4 protocol, there is now the possibility to have
significant deviations from a "one filehandle per object" model significant deviations from a "one filehandle per object" model
because a filehandle may be constructed on the basis of the object's because a filehandle may be constructed on the basis of the object's
pathname. Therefore, clients need a reliable method to determine if pathname. Therefore, clients need a reliable method to determine if
two filehandles designate the same filesystem object. If clients two filehandles designate the same file system object. If clients
were simply to assume that all distinct filehandles denote distinct were simply to assume that all distinct filehandles denote distinct
objects and proceed to do data caching on this basis, caching objects and proceed to do data caching on this basis, caching
inconsistencies would arise between the distinct client side objects inconsistencies would arise between the distinct client side objects
which mapped to the same server side object. which mapped to the same server side object.
By providing a method to differentiate filehandles, the NFSv4 By providing a method to differentiate filehandles, the NFSv4
protocol alleviates a potential functional regression in comparison protocol alleviates a potential functional regression in comparison
with the NFSv3 protocol. Without this method, caching with the NFSv3 protocol. Without this method, caching
inconsistencies within the same client could occur and this has not inconsistencies within the same client could occur and this has not
been present in previous versions of the NFS protocol. Note that it been present in previous versions of the NFS protocol. Note that it
skipping to change at page 157, line 16 skipping to change at page 147, line 35
o space limitation information to control flushing of data on close o space limitation information to control flushing of data on close
(OPEN_DELEGATE_WRITE delegation only, see Section 10.4.1) (OPEN_DELEGATE_WRITE delegation only, see Section 10.4.1)
o an nfsace4 specifying read and write permissions o an nfsace4 specifying read and write permissions
o a stateid to represent the delegation for READ and WRITE o a stateid to represent the delegation for READ and WRITE
The delegation stateid is separate and distinct from the stateid for The delegation stateid is separate and distinct from the stateid for
the OPEN proper. The standard stateid, unlike the delegation the OPEN proper. The standard stateid, unlike the delegation
stateid, is associated with a particular lock-owner and will continue stateid, is associated with a particular open-owner and will continue
to be valid after the delegation is recalled and the file remains to be valid after the delegation is recalled and the file remains
open. open.
When a request internal to the client is made to open a file and open When a request internal to the client is made to open a file and open
delegation is in effect, it will be accepted or rejected solely on delegation is in effect, it will be accepted or rejected solely on
the basis of the following conditions. Any requirement for other the basis of the following conditions. Any requirement for other
checks to be made by the delegate should result in open delegation checks to be made by the delegate should result in open delegation
being denied so that the checks can be made by the server itself. being denied so that the checks can be made by the server itself.
o The access and deny bits for the request and the file as described o The access and deny bits for the request and the file as described
skipping to change at page 158, line 40 skipping to change at page 149, line 11
client will force the server to recall a OPEN_DELEGATE_WRITE client will force the server to recall a OPEN_DELEGATE_WRITE
delegation. A WRITE with a special stateid done by another client delegation. A WRITE with a special stateid done by another client
will force a recall of OPEN_DELEGATE_READ delegations. will force a recall of OPEN_DELEGATE_READ delegations.
With delegations, a client is able to avoid writing data to the With delegations, a client is able to avoid writing data to the
server when the CLOSE of a file is serviced. The file close system server when the CLOSE of a file is serviced. The file close system
call is the usual point at which the client is notified of a lack of call is the usual point at which the client is notified of a lack of
stable storage for the modified file data generated by the stable storage for the modified file data generated by the
application. At the close, file data is written to the server and application. At the close, file data is written to the server and
through normal accounting the server is able to determine if the through normal accounting the server is able to determine if the
available filesystem space for the data has been exceeded (i.e., available file system space for the data has been exceeded (i.e.,
server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting
includes quotas. The introduction of delegations requires that a includes quotas. The introduction of delegations requires that a
alternative method be in place for the same type of communication to alternative method be in place for the same type of communication to
occur between client and server. occur between client and server.
In the delegation response, the server provides either the limit of In the delegation response, the server provides either the limit of
the size of the file or the number of modified blocks and associated the size of the file or the number of modified blocks and associated
block size. The server must ensure that the client will be able to block size. The server must ensure that the client will be able to
flush data to the server of a size equal to that provided in the flush data to the server of a size equal to that provided in the
original delegation. The server must make this assurance for all original delegation. The server must make this assurance for all
outstanding delegations. Therefore, the server must be careful in outstanding delegations. Therefore, the server must be careful in
its management of available space for new or modified data taking its management of available space for new or modified data taking
into account available filesystem space and any applicable quotas. into account available file system space and any applicable quotas.
The server can recall delegations as a result of managing the The server can recall delegations as a result of managing the
available filesystem space. The client should abide by the server's available file system space. The client should abide by the server's
state space limits for delegations. If the client exceeds the stated state space limits for delegations. If the client exceeds the stated
limits for the delegation, the server's behavior is undefined. limits for the delegation, the server's behavior is undefined.
Based on server conditions, quotas or available filesystem space, the Based on server conditions, quotas or available file system space,
server may grant OPEN_DELEGATE_WRITE delegations with very the server may grant OPEN_DELEGATE_WRITE delegations with very
restrictive space limitations. The limitations may be defined in a restrictive space limitations. The limitations may be defined in a
way that will always force modified data to be flushed to the server way that will always force modified data to be flushed to the server
on close. on close.
With respect to authentication, flushing modified data to the server With respect to authentication, flushing modified data to the server
after a CLOSE has occurred may be problematic. For example, the user after a CLOSE has occurred may be problematic. For example, the user
of the application may have logged off the client and unexpired of the application may have logged off the client and unexpired
authentication credentials may not be present. In this case, the authentication credentials may not be present. In this case, the
client may need to take special care to ensure that local unexpired client may need to take special care to ensure that local unexpired
credentials will in fact be available. This may be accomplished by credentials will in fact be available. This may be accomplished by
skipping to change at page 161, line 7 skipping to change at page 151, line 27
value of d may always be c + 1. value of d may always be c + 1.
While the change attribute is opaque to the client in the sense that While the change attribute is opaque to the client in the sense that
it has no idea what units of time, if any, the server is counting it has no idea what units of time, if any, the server is counting
change with, it is not opaque in that the client has to treat it as change with, it is not opaque in that the client has to treat it as
an unsigned integer, and the server has to be able to see the results an unsigned integer, and the server has to be able to see the results
of the client's changes to that integer. Therefore, the server MUST of the client's changes to that integer. Therefore, the server MUST
encode the change attribute in network order when sending it to the encode the change attribute in network order when sending it to the
client. The client MUST decode it from network order to its native client. The client MUST decode it from network order to its native
order when receiving it and the client MUST encode it network order order when receiving it and the client MUST encode it network order
when sending it to the server. For this reason, change is defined as when sending it to the server. For this reason, the change attribute
an unsigned integer rather than an opaque array of bytes. is defined as an unsigned integer rather than an opaque array of
bytes.
For the server, the following steps will be taken when providing a For the server, the following steps will be taken when providing a
OPEN_DELEGATE_WRITE delegation: OPEN_DELEGATE_WRITE delegation:
o Upon providing a OPEN_DELEGATE_WRITE delegation, the server will o Upon providing a OPEN_DELEGATE_WRITE delegation, the server will
cache a copy of the change attribute in the data structure it uses cache a copy of the change attribute in the data structure it uses
to record the delegation. Let this value be represented by sc. to record the delegation. Let this value be represented by sc.
o When a second client sends a GETATTR operation on the same file to o When a second client sends a GETATTR operation on the same file to
the server, the server obtains the change attribute from the first the server, the server obtains the change attribute from the first
skipping to change at page 163, line 4 skipping to change at page 153, line 22
to avoid its use. to avoid its use.
10.4.4. Recall of Open Delegation 10.4.4. Recall of Open Delegation
The following events necessitate recall of an open delegation: The following events necessitate recall of an open delegation:
o Potentially conflicting OPEN request (or READ/WRITE done with o Potentially conflicting OPEN request (or READ/WRITE done with
"special" stateid) "special" stateid)
o SETATTR issued by another client o SETATTR issued by another client
o REMOVE request for the file o REMOVE request for the file
o RENAME request for the file as either source or target of the o RENAME request for the file as either source or target of the
RENAME RENAME
Whether a RENAME of a directory in the path leading to the file Whether a RENAME of a directory in the path leading to the file
results in recall of an open delegation depends on the semantics of results in recall of an open delegation depends on the semantics of
the server filesystem. If that filesystem denies such RENAMEs when a the server file system. If that file system denies such RENAMEs when
file is open, the recall must be performed to determine whether the a file is open, the recall must be performed to determine whether the
file in question is, in fact, open. file in question is, in fact, open.
In addition to the situations above, the server may choose to recall In addition to the situations above, the server may choose to recall
open delegations at any time if resource constraints make it open delegations at any time if resource constraints make it
advisable to do so. Clients should always be prepared for the advisable to do so. Clients should always be prepared for the
possibility of recall. possibility of recall.
When a client receives a recall for an open delegation, it needs to When a client receives a recall for an open delegation, it needs to
update state on the server before returning the delegation. These update state on the server before returning the delegation. These
same updates must be done whenever a client chooses to return a same updates must be done whenever a client chooses to return a
skipping to change at page 167, line 34 skipping to change at page 158, line 5
modified data to the server on each close must ensure that the user modified data to the server on each close must ensure that the user
receives appropriate notification of the failure as a result of the receives appropriate notification of the failure as a result of the
revocation. Since such situations may require human action to revocation. Since such situations may require human action to
correct problems, notification schemes in which the appropriate user correct problems, notification schemes in which the appropriate user
or administrator is notified may be necessary. Logging and console or administrator is notified may be necessary. Logging and console
messages are typical examples. messages are typical examples.
If there is modified data on the client, it must not be flushed If there is modified data on the client, it must not be flushed
normally to the server. A client may attempt to provide a copy of normally to the server. A client may attempt to provide a copy of
the file data as modified during the delegation under a different the file data as modified during the delegation under a different
name in the filesystem name space to ease recovery. Note that when name in the file system name space to ease recovery. Note that when
the client can determine that the file has not been modified by any the client can determine that the file has not been modified by any
other client, or when the client has a complete cached copy of file other client, or when the client has a complete cached copy of the
in question, such a saved copy of the client's view of the file may file in question, such a saved copy of the client's view of the file
be of particular value for recovery. In other case, recovery using a may be of particular value for recovery. In other cases, recovery
copy of the file based partially on the client's cached data and using a copy of the file based partially on the client's cached data
partially on the server copy as modified by other clients, will be and partially on the server copy as modified by other clients, will
anything but straightforward, so clients may avoid saving file be anything but straightforward, so clients may avoid saving file
contents in these situations or mark the results specially to warn contents in these situations or mark the results specially to warn
users of possible problems. users of possible problems.
Saving of such modified data in delegation revocation situations may Saving of such modified data in delegation revocation situations may
be limited to files of a certain size or might be used only when be limited to files of a certain size or might be used only when
sufficient disk space is available within the target filesystem. sufficient disk space is available within the target file system.
Such saving may also be restricted to situations when the client has Such saving may also be restricted to situations when the client has
sufficient buffering resources to keep the cached copy available sufficient buffering resources to keep the cached copy available
until it is properly stored to the target filesystem. until it is properly stored to the target file system.
10.6. Attribute Caching 10.6. Attribute Caching
The attributes discussed in this section do not include named The attributes discussed in this section do not include named
attributes. Individual named attributes are analogous to files and attributes. Individual named attributes are analogous to files and
caching of the data for these needs to be handled just as data caching of the data for these needs to be handled just as data
caching is for regular files. Similarly, LOOKUP results from an caching is for regular files. Similarly, LOOKUP results from an
OPENATTR directory are to be cached on the same basis as any other OPENATTR directory are to be cached on the same basis as any other
pathnames and similarly for directory contents. pathnames and similarly for directory contents.
skipping to change at page 168, line 33 skipping to change at page 158, line 51
propagated directly to the server but when the modified data is propagated directly to the server but when the modified data is
flushed to the server, analogous attribute changes are made on the flushed to the server, analogous attribute changes are made on the
server. When open delegation is in effect, the modified attributes server. When open delegation is in effect, the modified attributes
may be returned to the server in the response to a CB_GETATTR call. may be returned to the server in the response to a CB_GETATTR call.
The result of local caching of attributes is that the attribute The result of local caching of attributes is that the attribute
caches maintained on individual clients will not be coherent. caches maintained on individual clients will not be coherent.
Changes made in one order on the server may be seen in a different Changes made in one order on the server may be seen in a different
order on one client and in a third order on a different client. order on one client and in a third order on a different client.
The typical filesystem application programming interfaces do not The typical file system application programming interfaces do not
provide means to atomically modify or interrogate attributes for provide means to atomically modify or interrogate attributes for
multiple files at the same time. The following rules provide an multiple files at the same time. The following rules provide an
environment where the potential incoherency mentioned above can be environment where the potential incoherency mentioned above can be
reasonably managed. These rules are derived from the practice of reasonably managed. These rules are derived from the practice of
previous NFS protocols. previous NFS protocols.
o All attributes for a given file (per-fsid attributes excepted) are o All attributes for a given file (per-fsid attributes excepted) are
cached as a unit at the client so that no non-serializability can cached as a unit at the client so that no non-serializability can
arise within the context of a single file. arise within the context of a single file.
o An upper time boundary is maintained on how long a client cache o An upper time boundary is maintained on how long a client cache
entry can be kept without being refreshed from the server. entry can be kept without being refreshed from the server.
o When operations are performed that change attributes at the o When operations are performed that modify attributes at the
server, the updated attribute set is requested as part of the server, the updated attribute set is requested as part of the
containing RPC. This includes directory operations that update containing RPC. This includes directory operations that update
attributes indirectly. This is accomplished by following the attributes indirectly. This is accomplished by following the
modifying operation with a GETATTR operation and then using the modifying operation with a GETATTR operation and then using the
results of the GETATTR to update the client's cached attributes. results of the GETATTR to update the client's cached attributes.
Note that if the full set of attributes to be cached is requested by Note that if the full set of attributes to be cached is requested by
READDIR, the results can be cached by the client on the same basis as READDIR, the results can be cached by the client on the same basis as
attributes obtained via GETATTR. attributes obtained via GETATTR.
skipping to change at page 170, line 21 skipping to change at page 160, line 37
occurs and the file is read (or if the block does not exist in the occurs and the file is read (or if the block does not exist in the
file, the block is allocated and then instantiated in the file, the block is allocated and then instantiated in the
application's address space). application's address space).
As long as each memory mapped access to the file requires a page As long as each memory mapped access to the file requires a page
fault, the relevant attributes of the file that are used to detect fault, the relevant attributes of the file that are used to detect
access and modification (time_access, time_metadata, time_modify, and access and modification (time_access, time_metadata, time_modify, and
change) will be updated. However, in many operating environments, change) will be updated. However, in many operating environments,
when page faults are not required these attributes will not be when page faults are not required these attributes will not be
updated on reads or updates to the file via memory access (regardless updated on reads or updates to the file via memory access (regardless
whether the file is local file or is being access remotely). A of whether the file is a local file or is being access remotely). A
client or server MAY fail to update attributes of a file that is client or server MAY fail to update attributes of a file that is
being accessed via memory mapped I/O. This has several implications: being accessed via memory mapped I/O. This has several implications:
o If there is an application on the server that has memory mapped a o If there is an application on the server that has memory mapped a
file that a client is also accessing, the client may not be able file that a client is also accessing, the client may not be able
to get a consistent value of the change attribute to determine to get a consistent value of the change attribute to determine
whether its cache is stale or not. A server that knows that the whether its cache is stale or not. A server that knows that the
file is memory mapped could always pessimistically return updated file is memory mapped could always pessimistically return updated
values for change so as to force the application to always get the values for change so as to force the application to always get the
most up to date data and metadata for the file. However, due to most up to date data and metadata for the file. However, due to
skipping to change at page 172, line 22 skipping to change at page 162, line 38
mandatory locking for I/O. If mandatory locking is enabled after mandatory locking for I/O. If mandatory locking is enabled after
the file is opened and mapped, the client MAY deny the application the file is opened and mapped, the client MAY deny the application
further access to its mapped file. further access to its mapped file.
10.8. Name Caching 10.8. Name Caching
The results of LOOKUP and READDIR operations may be cached to avoid The results of LOOKUP and READDIR operations may be cached to avoid
the cost of subsequent LOOKUP operations. Just as in the case of the cost of subsequent LOOKUP operations. Just as in the case of
attribute caching, inconsistencies may arise among the various client attribute caching, inconsistencies may arise among the various client
caches. To mitigate the effects of these inconsistencies and given caches. To mitigate the effects of these inconsistencies and given
the context of typical filesystem APIs, an upper time boundary is the context of typical file system APIs, an upper time boundary is
maintained on how long a client name cache entry can be kept without maintained on how long a client name cache entry can be kept without
verifying that the entry has not been made invalid by a directory verifying that the entry has not been made invalid by a directory
change operation performed by another client. change operation performed by another client.
When a client is not making changes to a directory for which there When a client is not making changes to a directory for which there
exist name cache entries, the client needs to periodically fetch exist name cache entries, the client needs to periodically fetch
attributes for that directory to ensure that it is not being attributes for that directory to ensure that it is not being
modified. After determining that no modification has occurred, the modified. After determining that no modification has occurred, the
expiration time for the associated name cache entries may be updated expiration time for the associated name cache entries may be updated
to be the current time plus the name cache staleness bound. to be the current time plus the name cache staleness bound.
skipping to change at page 173, line 22 skipping to change at page 163, line 39
change_info4 return value. When the information is not atomically change_info4 return value. When the information is not atomically
reported, the client should not assume that other clients have not reported, the client should not assume that other clients have not
changed the directory. changed the directory.
10.9. Directory Caching 10.9. Directory Caching
The results of READDIR operations may be used to avoid subsequent The results of READDIR operations may be used to avoid subsequent
READDIR operations. Just as in the cases of attribute and name READDIR operations. Just as in the cases of attribute and name
caching, inconsistencies may arise among the various client caches. caching, inconsistencies may arise among the various client caches.
To mitigate the effects of these inconsistencies, and given the To mitigate the effects of these inconsistencies, and given the
context of typical filesystem APIs, the following rules should be context of typical file system APIs, the following rules should be
followed: followed:
o Cached READDIR information for a directory which is not obtained o Cached READDIR information for a directory which is not obtained
in a single READDIR operation must always be a consistent snapshot in a single READDIR operation must always be a consistent snapshot
of directory contents. This is determined by using a GETATTR of directory contents. This is determined by using a GETATTR
before the first READDIR and after the last of READDIR that before the first READDIR and after the last of READDIR that
contributes to the cache. contributes to the cache.
o An upper time boundary is maintained to indicate the length of o An upper time boundary is maintained to indicate the length of
time a directory cache entry is considered valid before the client time a directory cache entry is considered valid before the client
skipping to change at page 175, line 44 skipping to change at page 166, line 12
5. Minor versions must not delete operations. 5. Minor versions must not delete operations.
This prevents the potential reuse of a particular operation This prevents the potential reuse of a particular operation
"slot" in a future minor version. "slot" in a future minor version.
6. Minor versions must not delete attributes. 6. Minor versions must not delete attributes.
7. Minor versions must not delete flag bits or enumeration values. 7. Minor versions must not delete flag bits or enumeration values.
8. Minor versions may declare an operation MUST NOT be implement. 8. Minor versions may declare an operation MUST NOT be implemented.
Specifying that an operation MUST NOT be implemented is Specifying that an operation MUST NOT be implemented is
equivalent to obsoleting an operation. For the client, it means equivalent to obsoleting an operation. For the client, it means
that the operation MUST NOT be sent to the server. For the that the operation MUST NOT be sent to the server. For the
server, an NFS error can be returned as opposed to "dropping" server, an NFS error can be returned as opposed to "dropping"
the request as an XDR decode error. This approach allows for the request as an XDR decode error. This approach allows for
the obsolescence of an operation while maintaining its structure the obsolescence of an operation while maintaining its structure
so that a future minor version can reintroduce the operation. so that a future minor version can reintroduce the operation.
1. Minor versions may declare that an attribute MUST NOT be 1. Minor versions may declare that an attribute MUST NOT be
skipping to change at page 176, line 37 skipping to change at page 167, line 7
infrastructural features to be RECOMMENDED or OPTIONAL infrastructural features to be RECOMMENDED or OPTIONAL
complicates implementation of the minor version. complicates implementation of the minor version.
13. A client MUST NOT attempt to use a stateid, filehandle, or 13. A client MUST NOT attempt to use a stateid, filehandle, or
similar returned object from the COMPOUND procedure with minor similar returned object from the COMPOUND procedure with minor
version X for another COMPOUND procedure with minor version Y, version X for another COMPOUND procedure with minor version Y,
where X != Y. where X != Y.
12. Internationalization 12. Internationalization
This chapter describes the string-handling aspects of the NFSv4 12.1. Introduction
protocol, and how they address issues related to
internationalization, including issues related to UTF-8,
normalization, string preparation, case folding, and handling of
internationalization issues related to domains.
The NFSv4 protocol needs to deal with internationalization, or I18N,
with respect to file names and other strings as used within the
protocol. The choice of string representation must allow for
reasonable name/string access to clients, applications, and users
which use various languages. The UTF-8 encoding of the UCS as
defined by [ISO.10646-1.1993] allows for this type of access and
follows the policy described in "IETF Policy on Character Sets and
Languages", [RFC2277].
In implementing such policies, it is important to understand and
respect the nature of NFSv4 as a means by which client
implementations may invoke operations on remote file systems. Server
implementations act as a conduit to a range of file system
implementations that the NFSv4 server typically invokes through a
virtual-file-system interface.
Keeping this context in mind, one needs to understand that the file
systems with which clients will be interacting will generally not be
devoted solely to access using NFS version 4. Local access and its
requirements will generally be important and often access over other
remote file access protocols will be as well. It is generally a
functional requirement in practice for the users of the NFSv4
protocol (although it may be formally out of scope for this document)
for the implementation to allow files created by other protocols and
by local operations on the file system to be accessed using NFS
version 4 as well.
It also needs to be understood that a considerable portion of file
name processing will occur within the implementation of the file
system rather than within the limits of the NFSv4 server
implementation per se. As a result, certain aspects of name
processing may change as the locus of processing moves from file
system to file system. As a result of these factors, the protocol
cannot enforce uniformity of name-related processing upon NFSv4
server requests on the server as a whole. Because the server
interacts with existing file system implementations, the same server
handling will produce different behavior when interacting with
different file system implementations. To attempt to require uniform
behavior, and treat the the protocol server and the file system as a
unified application, would considerably limit the usefulness of the
protocol.
12.1. Use of UTF-8
As mentioned above, UTF-8 is used as a convenient way to encode
Unicode which allows clients that have no internationalization
requirements to avoid these issues since the mapping of ASCII names
to UTF-8 is the identity.
12.1.1. Relation to Stringprep
RFC 3454 [RFC3454], otherwise known as "stringprep", documents a
framework for using Unicode/UTF-8 in networking protocols, intended
"to increase the likelihood that string input and string comparison
work in ways that make sense for typical users throughout the world."
A protocol conforming to this framework must define a profile of
stringprep "in order to fully specify the processing options."
NFSv4, while it does make normative references to stringprep and uses
elements of that framework, it does not, for reasons that are
explained below, conform to that framework, for all of the strings
that are used within it.
In addition to some specific issues which have caused stringprep to
add confusion in handling certain characters for certain languages,
there are a number of general reasons why stringprep profiles are not
suitable for describing NFSv4.
o Restricting the character repertoire to Unicode 3.2, as required
by stringprep is unduly constricting.
o Many of the character tables in stringprep are inappropriate
because of this limited character repertoire, so that normative
reference to stringprep is not desirable in many case and instead,
we allow more flexibility in the definition of case mapping
tables.
o Because of the presence of different file systems, the specifics
of processing are not fully defined and some aspects that are are
RECOMMENDED, rather than REQUIRED.
Despite these issues, in many cases the general structure of
stringprep profiles, consisting of sections which deal with the
applicability of the description, the character repertoire, character
mapping, normalization, prohibited characters, and issues of the
handling (i.e., possible prohibition) of bidirectional strings, is a
convenient way to describe the string handling which is needed and
will be used where appropriate.
12.1.2. Normalization, Equivalence, and Confusability
Unicode has defined several equivalence relationships among the set
of possible strings. Understanding the nature and purpose of these
equivalence relations is important to understand the handling of
Unicode strings within NFSv4.
Some string pairs are thought as only differing in the way accents
and other diacritics are encoded, as illustrated in the examples
below. Such string pairs are called "canonically equivalent".
Such equivalence can occur when there are precomposed characters,
as an alternative to encoding a base character in addition to a
combining accent. For example, the character LATIN SMALL LETTER E
WITH ACUTE (U+00E9) is defined as canonically equivalent to the
string consisting of LATIN SMALL LETTER E followed by COMBINING
ACUTE ACCENT (U+0065, U+0301).
When multiple combining diacritics are present, differences in the
ordering are not reflected in resulting display and the strings
are defined as canonically equivalent. For example, the string
consisting of LATIN SMALL LETTER Q, COMBINING ACUTE ACCENT,
COMBINING GRAVE ACCENT (U+0071, U+0301, U+0300) is canonically
equivalent to the string consisting of LATIN SMALL LETTER Q,
COMBINING GRAVE ACCENT, COMBINING ACUTE ACCENT (U+0071, U+0300,
U+0301)
When both situations are present, the number of canonically
equivalent strings can be greater. Thus, the following strings
are all canonically equivalent:
LATIN SMALL LETTER E, COMBINING MACRON, ACCENT, COMBINING ACUTE
ACCENT (U+0xxx, U+0304, U+0301)
LATIN SMALL LETTER E, COMBINING ACUTE ACCENT, COMBINING MACRON
(U+0xxx, U+0301, U+0304)
LATIN SMALL LETTER E WITH MACRON, COMBINING ACUTE ACCENT
(U+011E, U+0301)
LATIN SMALL LETTER E WITH ACUTE, COMBINING MACRON (U+00E9,
U+0304)
LATIN SMALL LETTER E WITH MACRON AND ACUTE (U+1E16)
Additionally there is an equivalence relation of "compatibility
equivalence". Two canonically equivalent strings are necessarily
compatibility equivalent, although not the converse. An example of
compatibility equivalent strings which are not canonically equivalent
are GREEK CAPITAL LETTER OMEGA (U+03A9) and OHM SIGN (U+2129). These
are identical in appearance while other compatibility equivalent
strings are not. Another example would be "x2" and the two character
string denoting x-squared which are clearly different in appearance
although compatibility equivalent and not canonically equivalent.
These have Unicode encodings LATIN SMALL LETTER X, DIGIT TWO (U+0078,
U+0032) and LATIN SMALL LETTER X, SUPERSCRIPT TWO (U+0078, U+00B2),
One way to deal with these equivalence relations is via
normalization. A normalization form maps all strings to a
corresponding normalized string in such a fashion that all strings
that are equivalent (canonically or compatibly, depending on the
form) are mapped to the same value. Thus the image of the mapping is
a subset of Unicode strings conceived as the representatives of the
equivalence classes defined by the chosen equivalence relation.
In the NFSv4 protocol, handling of issues related to
internationalization with regard to normalization follows one of two
basic patterns:
o For strings whose function is related to other internet standards,
such as server and domain naming, the normalization form defined
by the appropriate internet standards is used. For server and
domain naming, this involves normalization form NFKC as specified
in [RFC5891]
o For other strings, particular those passed by the server to file
system implementations, normalization requirements are the
province of the file system and the job of this specification is
not to specify a particular form but to make sure that
interoperability is maximized, even when clients and server-based
file systems have different preferences.
A related but distinct issue concerns string confusability. This can
occur when two strings (including single-character strings) having a
similar appearance. There have been attempts to define uniform
processing in an attempt to avoid such confusion (see stringprep
[RFC3454]) but the results have often added confusion.
Some examples of possible confusions and proposed processing intended
to reduce/avoid confusions:
o Deletion of characters believed to be invisible and appropriately
ignored, justifying their deletion, including, WORD JOINER
(U+2060), and the ZERO WIDTH SPACE (U+200B).
o Deletion of characters supposed to not bear semantics and only
affect glyph choice, including the ZERO WIDTH NON-JOINER (U+200C)
and the ZERO WIDTH JOINER (U+200D), where the deletion turns out
to be a problem for Farsi speakers.
o Prohibition of space characters such as the EM SPACE (U+2003), the
EN SPACE (U+2002), and the THIN SPACE (U+2009).
In addition, character pairs which appear very similar and could and
often do result in confusion. In addition to what Unicode defines as
"compatibility equivalence", there are a considerable number of
additional character pairs that could cause confusion. This includes
characters such as LATIN CAPITAL LETTER O (U+004F) and DIGIT ZERO
(U+0030), and CYRILLIC SMALL LETTER ER (U+0440) LATIN SMALL LETTER P
(U+0070) (also with MATHEMATICAL BOLD SMALL P (U+1D429) and GREEK
SMALL LETTER RHO (U+1D56, for good measure).
NFSv4, as it does with normalization, takes a two-part approach to
this issue:
o For strings whose function is related to other internet standards,
such as server and domain naming, any string processing to address
the confusability issue is defined by the appropriate internet
standards is used. For server and domain naming, this is the
responsibility of IDNA as described in [RFC5891].
o For other strings, particularly those passed by the server to file
system implementations, any such preparation requirements
including the choice of how, or whether to address the
confusability issue, are the responsibility of the file system to
define, and for this specification to try to add its own set would
add unacceptably to complexity, and make many files accessible
locally and by other remote file access protocols, inaccessible by
NFSv4. This specification defines how the protocol maximizes
interoperability in the face of different file system
implementations. NFSv4 does allow file systems to map and to
reject characters, including those likely to result in confusion,
since file systems may choose to do such things. It defines what
the client will see in such cases, in order to limit problems that
can arise when a file name is created and it appears to have a
different name from the one it is assigned when the name is
created.
12.2. String Type Overview
12.2.1. Overall String Class Divisions
NFSv4 has to deal with a large set of different types of strings and
because of the different role of each, internationalization issues
will be different for each:
o For some types of strings, the fundamental internationalization-
related decisions are the province of the file system or the
security-handling functions of the server and the protocol's job
is to establish the rules under which file systems and servers are
allowed to exercise this freedom, to avoid adding to confusion.
o In other cases, the fundamental internationalization issues are
the responsibility of other IETF groups and our job is simply to
reference those and perhaps make a few choices as to how they are
to be used (e.g., U-labels vs. A-labels).
o There are also cases in which a string has a small amount of NFSv4
processing which results in one or more strings being referred to
one of the other categories.
We will divide strings to be dealt with into the following classes:
MIX: indicating that there is small amount of preparatory processing
that either picks an internationalization handling mode or divides
the string into a set of (two) strings with a different mode
internationalization handling for each. The details are discussed
in the section "Types with Pre-processing to Resolve Mixture
Issues".
NIP: indicating that, for various reasons, there is no need for
internationalization-specific processing to be performed. The
specifics of the various string types handled in this way are
described in the section "String Types without
Internationalization Processing".
INET: indicating that the string needs to be processed in a fashion
governed by non-NFS-specific internet specifications. The details
are discussed in the section "Types with Processing Defined by
Other Internet Areas".
NFS: indicating that the string needs to be processed in a fashion
governed by NFSv4-specific considerations. The primary focus is
on enabling flexibility for the various file systems to be
accessed and is described in the section "String Types with NFS-
specific Processing".
12.2.2. Divisions by Typedef Parent types
There are a number of different string types within NFSv4 and
internationalization handling will be different for different types
of strings. Each the types will be in one of four groups based on
the parent type that specifies the nature of its relationship to utf8
and ascii.
utf8_expected/USHOULD: indicating that strings of this type SHOULD
be UTF-8 but clients and servers will not check for valid UTF-8
encoding.
utf8val_RECOMMENDED4/UVSHOULD: indicating that strings of this type
SHOULD be and generally will be in the form of the UTF-8 encoding
of Unicode. Strings in most cases will be checked by the server
for valid UTF-8 but for certain file systems, such checking may be
inhibited.
utf8val_REQUIRED4/UVMUST: indicating that strings of this type MUST
be in the form of the UTF-8 encoding of Unicode. Strings will be
checked by the server for valid UTF-8 and the server SHOULD ensure
that when sent to the client, they are valid UTF-8.
ascii_REQUIRED4/ASCII: indicating that strings of this type MUST be
sent and validated as ASCII, and thus are automatically UTF-8.
The processing of these string must ensure that they are only have
ASCII characters but this need not be a separate step if any
normally required check for validity inherently assures that only
ASCII characters are present.
In those cases where UTF-8 is not required, USHOULD and UVSHOULD, and
strings that are not valid UTF-8 are received and accepted, the
receiver MUST NOT modify the strings. For example, setting
particular bits such as the high-order bit to zero MUST NOT be done.
12.2.3. Individual Types and Their Handling
The first table outlines the handling for the primary string types,
i.e., those not derived as a prefix or a suffix from a mixture type.
+-----------------+----------+-------+------------------------------+
| Type | Parent | Class | Explanation |
+-----------------+----------+-------+------------------------------+
| comptag4 | USHOULD | NIP | Tag expected to be UTF-8 but |
| | | | no validation by server or |
| | | | client is to be done. |
| component4 | UVSHOULD | NFS | Should be utf8 but clients |
| | | | may need to access file |
| | | | systems with a different |
| | | | name structure, such as file |
| | | | systems that have non-utf8 |
| | | | names. |
| linktext4 | UVSHOULD | NFS | Should be utf8 since text |
| | | | may include name components. |
| | | | Because of the need to |
| | | | access existing file |
| | | | systems, this check may be |
| | | | inhibited. |
| fattr4_mimetype | ASCII | NIP | All mime types are ascii so |
| | | | no specific utf8 processing |
| | | | is required, given that you |
| | | | are comparing to that list. |
+-----------------+----------+-------+------------------------------+
Table 5
There are a number of string types that are subject to preliminary
processing. This processing may take the form either of selecting
one of two possible forms based on the string contents or it in may
consist of dividing the string into multiple conjoined strings each
with different utf8-related processing.
+---------+--------+-------+----------------------------------------+
| Type | Parent | Class | Explanation |
+---------+--------+-------+----------------------------------------+
| prin4 | UVMUST | MIX | Consists of two parts separated by an |
| | | | at-sign, a prinpfx4 and a prinsfx4. |
| | | | These are described in the next table. |
| server4 | UVMUST | MIX | Is either an IP address (serveraddr4) |
| | | | which has to be pure ascii or a server |
| | | | name svrname4, which is described |
| | | | immediately below. |
+---------+--------+-------+----------------------------------------+
Table 6
The last table describes the components of the compound types
described above.
+----------+--------+------+----------------------------------------+
| Type | Class | Def | Explanation |
+----------+--------+------+----------------------------------------+
| svraddr4 | ASCII | NIP | Server as IP address, whether IPv4 or |
| | | | IPv6. |
| svrname4 | UVMUST | INET | Server name as returned by server. |
| | | | Not sent by client, except in |
| | | | VERIFY/NVERIFY. |
| prinsfx4 | UVMUST | INET | Suffix part of principal, in the form |
| | | | of a domain name. |
| prinpfx4 | UVMUST | NFS | Must match one of a list of valid |
| | | | users or groups for that particular |
| | | | domain. |
+----------+--------+------+----------------------------------------+
Table 7
12.3. Errors Related to Strings
When the client sends an invalid UTF-8 string in a context in which
UTF-8 is REQUIRED, the server MUST return an NFS4ERR_INVAL error.
Within the framework of the previous section, this applies to strings
whose type is defined as utf8val_REQUIRED4 or ascii_REQUIRED4. When
the client sends an invalid UTF-8 string in a context in which UTF-8
is RECOMMENDED and the server should test for UTF-8, the server
SHOULD return an NFS4ERR_INVAL error. Within the framework of the
previous section, this applies to strings whose type is defined as
utf8val_RECOMMENDED4. These situations apply to cases in which
inappropriate prefixes are detected and where the count includes
trailing bytes that do not constitute a full UCS character.
Where the client-supplied string is valid UTF-8 but contains
characters that are not supported by the server file system as a
value for that string (e.g., names containing characters that have
more than two octets on a file system that supports UCS-2 characters
only, file name components containing slashes on file systems that do
not allow them in file name components), the server MUST return an
NFS4ERR_BADCHAR error.
Where a UTF-8 string is used as a file name component, and the file This section describes NFSv4.0 internationalization as implemented by
system, while supporting all of the characters within the name, does existing clients and servers. As a result, the keywords "MUST",
not allow that particular name to be used, the server should return "SHOULD", and "MAY", even though they retain their normal meanings,
the error NFS4ERR_BADNAME. This includes file system prohibitions of reflect patterns of existing implementation:
"." and ".." as file names for certain operations, and other such
similar constraints. It does not include use of strings with non-
preferred normalization modes.
Where a UTF-8 string is used as a file name component, the file o Behavior implemented by all existing clients or servers is
system implementation MUST NOT return NFS4ERR_BADNAME, simply due to described using "MUST", since new implementations need to follow
a normalization mismatch. In such cases the implementation SHOULD existing ones to be assured of interoperability.
convert the string to its own preferred normalization mode before
performing the operation. As a result, a client cannot assume that a
file created with a name it specifies will have that name when the
directory is read. It may have instead, the name converted to the
file system's preferred normalization form.
Where a UTF-8 string is used as other than as file name component (or o Behavior implemented by no existing clients or servers is
as symbolic link text) and the string does not meet the normalization described using "MUST NOT", if such behavior poses
requirements specified for it, the error NFS4ERR_INVAL is returned. interoperability problems.
12.4. Types with Pre-processing to Resolve Mixture Issues o Behavior implemented by most existing clients or servers, where
that behavior is more desirable than any alternative is described
using "SHOULD", since new implementations need to follow that
existing practice unless there are strong reasons to do otherwise.
This also holds for "SHOULD NOT".
12.4.1. Processing of Principal Strings o Behavior implemented by some not all existing clients or servers,
is described using "MAY", indicating that new implementations have
a choice as to whether they will behave in that way.
Strings denoting principals (users or groups) MUST be UTF-8 but since o Behavior implemented by all existing clients or servers, so far as
they consist of a principal prefix, an at-sign, and a domain, all is known, but where there remains some uncertainty as to details
three of which either are checked for being UTF-8, or inherently are is described using "should". Such cases primary concern details
UTF-8, checking the string as a whole for being UTF-8 is not of error returns. New implementations should follow existing
required. Although a server implementation may choose to make this practice even though such situations generally do not affect
check on the string as whole, for example in converting it to interoperability.
Unicode, the description within this document, will reflect a
processing model in which such checking happens after the division
into a principal prefix and suffix, the latter being in the form of a
domain name.
The string should be scanned for at-signs. If there is more that one 12.2. String Encoding
at-sign, the string is considered invalid. For cases in which there
are no at-signs or the at-sign appears at the start or end of the
string see Interpreting owner and owner_group. Otherwise, the
portion before the at-sign is dealt with as a prinpfx4 and the
portion after is dealt with as a prinsfx4.
12.4.2. Processing of Server Id Strings Strings that potentially contain non-ASCII Characters are represented
in NFSv4 using the UTF-8 encoding of Unicode. See [RFC2279] for
precise encoding and decoding rules.
Server id strings typically appear in responses (as attribute values) Some details of the protocol treatment depend on the type of string:
and only appear in requests as an attribute value presented to VERIFY
and NVERIFY. With that exception, they are not subject to server
validation and possible rejection. It is not expected that clients
will typically do such validation on receipt of responses but they
may as a way to check for proper server behavior. The responsibility
for sending correct UTF-8 strings is with the server.
Servers are identified by either server names or IP addresses. Once o For strings that are component names, any non-ASCII characters
an id has been identified as an IP address, then there is no MUST be represented using the UTF-8 encoding of Unicode.
processing specific to internationalization to be done, since such an
address must be ASCII to be valid.
12.5. String Types without Internationalization Processing o For strings whose form is defined by other internet standards,
non-ASCII characters MUST be represented using the UTF-8 encoding
of Unicode. In addition other sorts of restrictions defined by
those standards need to be addressed. See section 11.4 for
details.
There are a number of types of strings which, for a number of o For other sorts of strings, any non-ASCII characters SHOULD be
different reasons, do not require any internationalization-specific represented using the UTF-8 encoding of Unicode.
handling, such as validation of UTF-8, normalization, or character
mapping or checking. This does not necessarily mean that the strings
need not be UTF-8. In some case, other checking on the string
ensures that they are valid UTF-8, without doing any checking
specific to internationalization.
The following are the specific types: 12.3. Normalization
comptag4: strings are an aid to debugging and the sender should The client and server operating environments may differ in their
avoid confusion by not using anything but valid UTF-8. But any policies and operational methods with respect to character
work validating the string or modifying it would only add normalization (See [Unicode1] for a discussion of normalization
complication to a mechanism whose basic function is best supported forms). This difference may also exist between applications on the
by making it not subject to any checking and having data maximally same client. This adds to the difficulty of providing a single
available to be looked at in a network trace. normalization policy for the protocol that allows for maximal
interoperability. This issue is similar to the character case issues
where the server may or may not support case insensitive file name
matching and may or may not preserve the character case when storing
file names. The protocol does not mandate a particular behavior but
allows for a range of useful behaviors.
fattr4_mimetype: strings need to be validated by matching against a The NFS version 4 protocol does not mandate the use of a particular
list of valid mime types. Since these are all ASCII, no normalization form at this time. A subsequent minor version of the
processing specific to internationalization is required since NFSv4 protocol might specify a particular normalization form.
anything that does not match is invalid and anything which does Therefore, the server and client can expect that they may receive
not obey the rules of UTF-8 will not be ASCII and consequently unnormalized characters within protocol requests and responses. If
will not match, and will be invalid. the operating environment requires normalization, then the
implementation must normalize the various UTF-8 encoded strings
within the protocol before presenting the information to an
application (at the client) or local file system (at the server).
svraddr4: strings, in order to be valid, need to be ASCII, but if Server implementations MAY normalize file names to conform to a
you check them for validity, you have inherently checked that that particular normalization form before using the resulting string when
they are ASCII and thus UTF-8. looking up or creating a file. Servers MAY also perform
normalization-insensitive string comparisons without modifying name
to match a particular normalization form. Servers MUST NOT reject a
file name because it doesn't a conform to a particular normalization
form.
12.6. Types with Processing Defined by Other Internet Areas 12.4. Types with Processing Defined by Other Internet Areas
There are two types of strings which NFSv4 deals with whose There are two types of strings which NFSv4 deals with whose
processing is defined by other Internet standards, and where issues processing is defined by other Internet standards, and where issues
related to different handling choices by server operating systems or related to different handling choices by server operating systems or
server file systems do not apply. server file systems do not apply.
These are as follows: These are as follows:
o Server names as they appear in the fs_locations attribute. Note o Server names as they appear in the fs_locations attribute. Note
that for most purposes, such server names will only be sent by the that for most purposes, such server names will only be sent by the
skipping to change at page 187, line 33 skipping to change at page 169, line 24
are similar and independent of role the of the sender or receiver as are similar and independent of role the of the sender or receiver as
client or server although the consequences of failure to obey these client or server although the consequences of failure to obey these
rules may be different for client or server. The server can report rules may be different for client or server. The server can report
errors when it is sent invalid strings, whereas the client will errors when it is sent invalid strings, whereas the client will
simply ignore invalid string or use a default value in their place. simply ignore invalid string or use a default value in their place.
The string sent SHOULD be in the form of a U-label although it MAY be The string sent SHOULD be in the form of a U-label although it MAY be
in the form of an A-label or a UTF-8 string that would not map to in the form of an A-label or a UTF-8 string that would not map to
itself when canonicalized by applying ToUnicode(ToASCII(...)). The itself when canonicalized by applying ToUnicode(ToASCII(...)). The
receiver needs to be able to accept domain and server names in any of receiver needs to be able to accept domain and server names in any of
the formats allowed. The server MUST reject, using the the error the formats allowed. The server MUST reject, using the error
NFS4ERR_INVAL, a string which is not valid UTF-8 or which begins with NFS4ERR_INVAL, a string which is not valid UTF-8 or which begins with
"xn--" and violates the rules for a valid A-label. "xn--" and violates the rules for a valid A-label.
When a domain string is part of id@domain or group@domain, the server When a domain string is part of id@domain or group@domain, the server
SHOULD map domain strings which are A-labels or are UTF-8 domain SHOULD map domain strings which are A-labels or are UTF-8 domain
names which are not U-labels, to the corresponding U-label, using names which are not U-labels, to the corresponding U-label, using
ToUnicode(domain) or ToUnicode(ToASCII(domain)). As a result, the ToUnicode(domain) or ToUnicode(ToASCII(domain)). As a result, the
domain name returned within a userid on a GETATTR may not match that domain name returned within a userid on a GETATTR may not match that
sent when the userid is set using SETATTR, although when this sent when the userid is set using SETATTR, although when this
happens, the domain will be in the form of a U-label. When the happens, the domain will be in the form of a U-label. When the
server does not map domain strings which are not U-labels into a server does not map domain strings which are not U-labels into a
U-label, which it MAY do, it MUST NOT modify the domain and the U-label, which it MAY do, it MUST NOT modify the domain and the
domain returned on a GETATTR of the userid MUST be the same as that domain returned on a GETATTR of the userid MUST be the same as that
used when setting the userid by the SETATTR. used when setting the userid by the SETATTR.
The server MAY implement VERIFY and NVERIFY without translating The server MAY implement VERIFY and NVERIFY without translating
internal state to a string form, so that, for example, a user internal state to a string form, so that, for example, a user
principal which represents a specific numeric user id, will match a principal which represents a specific numeric user id, will match a
different principal string which represents the same numeric user id. different principal string which represents the same numeric user id.
12.7. String Types with NFS-specific Processing 12.5. UTF-8 Related Errors
For a number of data types within NFSv4, the primary responsibility
for internationalization-related handling is that of some entity
other than the server itself (see below for details). In these
situations, the primary responsibility of NFSv4 is to provide a
framework in which that other entity (file system and server
operating system principal naming framework) implements its own
<