draft-ietf-nfsv4-mv1-msns-update-01.txt   draft-ietf-nfsv4-mv1-msns-update-02.txt 
NFSv4 D. Noveck, Ed. NFSv4 D. Noveck, Ed.
Internet-Draft NetApp Internet-Draft NetApp
Updates: 5661 (if approved) C. Lever Updates: 5661 (if approved) C. Lever
Intended status: Standards Track ORACLE Intended status: Standards Track ORACLE
Expires: December 11, 2018 June 9, 2018 Expires: April 24, 2019 October 21, 2018
NFSv4.1 Update for Multi-Server Namespace NFS Version 4.1 Update for Multi-Server Namespace
draft-ietf-nfsv4-mv1-msns-update-01 draft-ietf-nfsv4-mv1-msns-update-02
Abstract Abstract
This document presents necessary clarifications and corrections This document presents necessary clarifications and corrections
concerning features related to the use of location-related attributes concerning features related to the use of location-related attributes
in NFSv4.1. These include migration, which transfers responsibility in NFSv4.1. These include migration, which transfers responsibility
for a file system from one server to another, and facilities to for a file system from one server to another, and facilities to
support trunking by allowing discovery of the set of network support trunking by allowing discovery of the set of network
addresses to use to access a file system. This document updates addresses to use to access a file system. This document updates
RFC5661. RFC5661.
skipping to change at page 1, line 37 skipping to change at page 1, line 37
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 11, 2018. This Internet-Draft will expire on April 24, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4
3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Summary of Issues . . . . . . . . . . . . . . . . . . . . 6 3.2. Summary of Issues . . . . . . . . . . . . . . . . . . . . 7
3.3. Relationship of this Document to RFC5661 . . . . . . . . 8 3.3. Relationship of this Document to RFC5661 . . . . . . . . 9
4. Changes to Section 11 of RFC5661 . . . . . . . . . . . . . . 9 4. Changes to Section 11 of RFC5661 . . . . . . . . . . . . . . 10
4.1. Multi-Server Namespace (as updated) . . . . . . . . . . . 10 4.1. Multi-Server Namespace (as updated) . . . . . . . . . . . 11
4.2. Location-related Terminology (to be added) . . . . . . . 10 4.2. Location-related Terminology (to be added) . . . . . . . 11
4.3. Location Attributes (as updated) . . . . . . . . . . . . 12 4.3. Location Attributes (as updated) . . . . . . . . . . . . 13
4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 . . 13 4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 . . 14
4.5. Uses of Location Information (as updated) . . . . . . . . 13 4.5. Uses of Location Information (as updated) . . . . . . . . 14
4.5.1. Combining Multiple Uses in a Single Attribute (to be 4.5.1. Combining Multiple Uses in a Single Attribute (to be
added) . . . . . . . . . . . . . . . . . . . . . . . 14 added) . . . . . . . . . . . . . . . . . . . . . . . 15
4.5.2. Location Attributes and Trunking (to be added) . . . 15 4.5.2. Location Attributes and Trunking (to be added) . . . 16
4.5.3. Location Attributes and Connection Type Selection (to 4.5.3. Location Attributes and Connection Type Selection (to
be added) . . . . . . . . . . . . . . . . . . . . . . 15 be added) . . . . . . . . . . . . . . . . . . . . . . 16
4.5.4. File System Replication (as updated) . . . . . . . . 16 4.5.4. File System Replication (as updated) . . . . . . . . 17
4.5.5. File System Migration (as updated) . . . . . . . . . 16 4.5.5. File System Migration (as updated) . . . . . . . . . 17
4.5.6. Referrals (as updated) . . . . . . . . . . . . . . . 17 4.5.6. Referrals (as updated) . . . . . . . . . . . . . . . 19
4.5.7. Changes in a Location Attribute (to be added) . . . . 19 4.5.7. Changes in a Location Attribute (to be added) . . . . 20
5. Re-organization of Section 11.7 of RFC5661 . . . . . . . . . 20 5. Re-organization of Section 11.7 of RFC5661 . . . . . . . . . 21
6. Overview of File Access Transitions (to be added) . . . . . . 20 6. Overview of File Access Transitions (to be added) . . . . . . 22
7. Effecting Network Endpoint Transitions (to be added) . . . . 21 7. Effecting Network Endpoint Transitions (to be added) . . . . 22
8. Effecting File System Transitions (as updated) . . . . . . . 22 8. Effecting File System Transitions (as updated) . . . . . . . 23
8.1. File System Transitions and Simultaneous Access (as 8.1. File System Transitions and Simultaneous Access (as
updated) . . . . . . . . . . . . . . . . . . . . . . . . 23 updated) . . . . . . . . . . . . . . . . . . . . . . . . 24
8.2. Filehandles and File System Transitions (as updated) . . 23 8.2. Filehandles and File System Transitions (as updated) . . 24
8.3. Fileids and File System Transitions (as updated) . . . . 24 8.3. Fileids and File System Transitions (as updated) . . . . 25
8.4. Fsids and File System Transitions (as updated) . . . . . 25 8.4. Fsids and File System Transitions (as updated) . . . . . 26
8.4.1. File System Splitting (as updated) . . . . . . . . . 25 8.4.1. File System Splitting (as updated) . . . . . . . . . 26
8.5. The Change Attribute and File System Transitions (as 8.5. The Change Attribute and File System Transitions (as
updated) . . . . . . . . . . . . . . . . . . . . . . . . 26 updated) . . . . . . . . . . . . . . . . . . . . . . . . 27
8.6. Write Verifiers and File System Transitions (as updated) 26 8.6. Write Verifiers and File System Transitions (as updated) 27
8.7. Readdir Cookies and Verifiers and File System Transitions 8.7. Readdir Cookies and Verifiers and File System Transitions
(as updated) . . . . . . . . . . . . . . . . . . . . . . 26 (as updated) . . . . . . . . . . . . . . . . . . . . . . 28
8.8. File System Data and File System Transitions (as updated) 27 8.8. File System Data and File System Transitions (as updated) 28
8.9. Lock State and File System Transitions (as updated) . . . 28 8.9. Lock State and File System Transitions (as updated) . . . 29
9. Transferring State upon Migration (to be added) . . . . . . . 29 9. Transferring State upon Migration (to be added) . . . . . . . 30
9.1. Transparent State Migration and pNFS (to be added) . . . 29 9.1. Transparent State Migration and pNFS (to be added) . . . 31
10. Client Responsibilities when Access is Transitioned (to be 10. Client Responsibilities when Access is Transitioned (to be
added) . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 added) . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
10.1. Client Transition Notifications (to be added) . . . . . 31 10.1. Client Transition Notifications (to be added) . . . . . 32
10.2. Performing Migration Discovery (to be added) . . . . . . 33 10.2. Performing Migration Discovery (to be added) . . . . . . 35
10.3. Overview of Client Response to NFS4ERR_MOVED (to be 10.3. Overview of Client Response to NFS4ERR_MOVED (to be
added) . . . . . . . . . . . . . . . . . . . . . . . . . 36 added) . . . . . . . . . . . . . . . . . . . . . . . . . 37
10.4. Obtaining Access to Sessions and State after Migration 10.4. Obtaining Access to Sessions and State after Migration
(to be added) . . . . . . . . . . . . . . . . . . . . . 37 (to be added) . . . . . . . . . . . . . . . . . . . . . 39
10.5. Obtaining Access to Sessions and State after Network 10.5. Obtaining Access to Sessions and State after Network
Address Transfer (to be added) . . . . . . . . . . . . . 39 Address Transfer (to be added) . . . . . . . . . . . . . 41
11. Server Responsibilities Upon Migration (to be added) . . . . 40 11. Server Responsibilities Upon Migration (to be added) . . . . 41
11.1. Server Responsibilities in Effecting Transparent State 11.1. Server Responsibilities in Effecting State Reclaim after
Migration (to be added) . . . . . . . . . . . . . . . . 40 Migration (to be added) . . . . . . . . . . . . . . . . 42
11.2. Server Responsibilities in Effecting Session Transfer 11.2. Server Responsibilities in Effecting Transparent State
(to be added) . . . . . . . . . . . . . . . . . . . . . 42 Migration (to be added) . . . . . . . . . . . . . . . . 42
12. Changes to RFC5661 outside Section 11 . . . . . . . . . . . . 44 11.3. Server Responsibilities in Effecting Session Transfer
12.1. (Introduction to) Multi-Server Namespace (as updated) . 45 (to be added) . . . . . . . . . . . . . . . . . . . . . 44
12.2. Server Scope (as updated) . . . . . . . . . . . . . . . 46 12. fs_locations_info . . . . . . . . . . . . . . . . . . . . . . 46
12.3. Revised Treatment of NFS4ERR_MOVED . . . . . . . . . . . 47 12.1. Updates to treatment of fs_locations_info . . . . . . . 47
12.4. Revised Discussion of Server_owner changes . . . . . . . 48 12.2. The Attribute fs_locations_info (as updated) . . . . . . 47
12.5. Revision to Treatment of EXCHANGE_ID . . . . . . . . . . 49 12.2.1. The fs_locations_server4 Structure (as updated) . . 51
13. Operation 42: EXCHANGE_ID - Instantiate Client ID (as 12.2.2. The fs_locations_info4 Structure (as updated) . . . 57
updated) . . . . . . . . . . . . . . . . . . . . . . . . . . 50 12.2.3. The fs_locations_item4 Structure (as updated) . . . 59
14. Security Considerations . . . . . . . . . . . . . . . . . . . 68 13. Changes to RFC5661 outside Section 11 . . . . . . . . . . . . 61
15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 70 13.1. (Introduction to) Multi-Server Namespace (as updated) . 62
16. References . . . . . . . . . . . . . . . . . . . . . . . . . 71 13.2. Server Scope (as updated) . . . . . . . . . . . . . . . 62
16.1. Normative References . . . . . . . . . . . . . . . . . . 71 13.3. Revised Treatment of NFS4ERR_MOVED . . . . . . . . . . . 64
16.2. Informative References . . . . . . . . . . . . . . . . . 72 13.4. Revised Discussion of Server_owner changes . . . . . . . 65
Appendix A. Classification of Document Sections . . . . . . . . 72 13.5. Revision to Treatment of EXCHANGE_ID . . . . . . . . . . 65
Appendix B. Updates to RFC5661 . . . . . . . . . . . . . . . . . 74 13.6. Revision to Treatment of RECLAIM_COMPLETE . . . . . . . 67
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 76 13.7. Reclaim Errors (as updated) . . . . . . . . . . . . . . 67
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 77 13.7.1. NFS4ERR_COMPLETE_ALREADY (as updated; Error Code
10054) . . . . . . . . . . . . . . . . . . . . . . . 67
13.7.2. NFS4ERR_GRACE (as updated; Error Code 10013) . . . . 67
13.7.3. NFS4ERR_NO_GRACE (as updated; Error Code 10033) . . 67
13.7.4. NFS4ERR_RECLAIM_BAD (as updated; Error Code 10034) . 68
13.7.5. NFS4ERR_RECLAIM_CONFLICT (as updated; Error Code
10035) . . . . . . . . . . . . . . . . . . . . . . . 68
14. Operation 42: EXCHANGE_ID - Instantiate Client ID (as
updated) . . . . . . . . . . . . . . . . . . . . . . . . . . 68
15. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished
(as updated) . . . . . . . . . . . . . . . . . . . . . . . . 86
16. Security Considerations . . . . . . . . . . . . . . . . . . . 90
17. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 92
18. References . . . . . . . . . . . . . . . . . . . . . . . . . 92
18.1. Normative References . . . . . . . . . . . . . . . . . . 92
18.2. Informative References . . . . . . . . . . . . . . . . . 93
Appendix A. Classification of Document Sections . . . . . . . . 94
Appendix B. Updates to RFC5661 . . . . . . . . . . . . . . . . . 95
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 98
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 99
1. Introduction 1. Introduction
This document defines the proper handling, within NFSv4.1, of the This document defines the proper handling, within NFSv4.1, of the
location-related attributes fs_locations and fs_locations_info and location-related attributes fs_locations and fs_locations_info and
how necessary changes in those attributes are to be dealt with. The how necessary changes in those attributes are to be dealt with. The
necessary corrections and clarifications parallel those done for necessary corrections and clarifications parallel those done for
NFSv4.0 in [RFC7931] and [I-D.cel-nfsv4-mv0-trunking-update]. NFSv4.0 in [RFC7931] and [I-D.cel-nfsv4-mv0-trunking-update].
A large part of the changes to be made are necessary to clarify the A large part of the changes to be made are necessary to clarify the
skipping to change at page 4, line 10 skipping to change at page 4, line 29
Another important issue to be dealt with concerns the handling of Another important issue to be dealt with concerns the handling of
multiple entries within location-related attributes that represent multiple entries within location-related attributes that represent
different ways to access the same file system. Unfortunately different ways to access the same file system. Unfortunately
[RFC5661], while recognizing that these entries can represent [RFC5661], while recognizing that these entries can represent
different ways to access the same file system, confuses the matter by different ways to access the same file system, confuses the matter by
treating network access paths as "replicas", making it difficult for treating network access paths as "replicas", making it difficult for
these attributes to be used to obtain information about the network these attributes to be used to obtain information about the network
addresses to be used to access particular file system instances and addresses to be used to access particular file system instances and
engendering confusion between two different sorts of transition: engendering confusion between two different sorts of transition:
those involving a change of network access paths to the same file those involving a change of network access paths to the same file
system instance and those in which there is shift between two system instance and those in which there is a shift between two
distinct replicas. distinct replicas.
When location information is used to determine the set of network When location information is used to determine the set of network
addresses to access a particular file system instance (i.e. to addresses to access a particular file system instance (i.e. to
perform trunking discovery), clarification is needed regarding the perform trunking discovery), clarification is needed regarding the
interaction of trunking and transitions between file system replicas, interaction of trunking and transitions between file system replicas,
including migration. Unfortunately [RFC5661], while it provided a including migration. Unfortunately [RFC5661], while it provided a
method of determining whether two network addresses were connected to method of determining whether two network addresses were connected to
the same server, did not address the issue of trunking discovery, the same server, did not address the issue of trunking discovery,
making it necessary to address it in this document. making it necessary to address it in this document.
skipping to change at page 5, line 15 skipping to change at page 5, line 37
version, and, in some cases, on the client implementation. version, and, in some cases, on the client implementation.
In the case of NFS version 4.1 and later minor versions, the means In the case of NFS version 4.1 and later minor versions, the means
of trunking detection are as described by [RFC5661] and are of trunking detection are as described by [RFC5661] and are
available to every client. Two network addresses connected to the available to every client. Two network addresses connected to the
same server are always server-trunkable but are not necessarily same server are always server-trunkable but are not necessarily
session-trunkable. session-trunkable.
o Trunking discovery is a process by which a client using one o Trunking discovery is a process by which a client using one
network address can obtain other addresses that are connected to network address can obtain other addresses that are connected to
the same server Typically it builds on a trunking detection the same server. Typically it builds on a trunking detection
facility by providing one or more methods by which candidate facility by providing one or more methods by which candidate
addresses are made available to the client who can then use addresses are made available to the client who can then use
trunking detection to appropriately filter them. trunking detection to appropriately filter them.
Despite the support for trunking detection there was no Despite the support for trunking detection there was no
description of trunking discovery provided in [RFC5661]. description of trunking discovery provided in [RFC5661].
Regarding network addresses and the handling of trunking we use the Regarding network addresses and the handling of trunking we use the
following terminology: following terminology:
o Each NFSv4 server is assumed to have a set of IP addresses to o Each NFSv4 server is assumed to have a set of IP addresses to
which NFSv4 requests may be sent by clients. These are referred which NFSv4 requests may be sent by clients. These are referred
to as the server's network addresses. Access to a specfic server to as the server's network addresses. Access to a specific server
network address may involve the use of multiple ports, since the network address may involve the use of multiple ports, since the
ports to be used for various types of connections might be ports to be used for various types of connections might be
required to be different. required to be different.
o Each network address, when combined with a pathname providing the o Each network address, when combined with a pathname providing the
location of a file system root directory relative to the location of a file system root directory relative to the
associated server root file handle, defines a file system network associated server root file handle, defines a file system network
access path. access path.
o Server network addresses are used to establish connections to o Server network addresses are used to establish connections to
skipping to change at page 7, line 34 skipping to change at page 8, line 8
instance (i.e. trunking) was often treated as if two replicas were instance (i.e. trunking) was often treated as if two replicas were
involved, it was considered that two replicas were being used involved, it was considered that two replicas were being used
simultaneously. As a result, the treatment of replicas being used simultaneously. As a result, the treatment of replicas being used
simultaneously in [RFC5661] was not clear as it covered the two simultaneously in [RFC5661] was not clear as it covered the two
distinct cases of a single file system instance being accessed by distinct cases of a single file system instance being accessed by
two different network access paths and two replicas being accessed two different network access paths and two replicas being accessed
simultaneously, with the limitations of the latter case not being simultaneously, with the limitations of the latter case not being
clearly laid out. clearly laid out.
The majority of the consequences of these issues are dealt with via The majority of the consequences of these issues are dealt with via
the updates in various subsections of Section 4 of the current the updates in various subsections of Section 4 and the whole of
document which deal with problems within Section 11 of [RFC5661]. Section 12 within the current document which deal with problems
These include: within Section 11 of [RFC5661] These changes include:
o Reorganization made necessary by the fact that two network access o Reorganization made necessary by the fact that two network access
paths to the same file system instance needs to be distinguished paths to the same file system instance needs to be distinguished
clearly from two different replicas since the former share locking clearly from two different replicas since the former share locking
state and can share session state. state and can share session state.
o The need for a clear statement regarding the desirability of o The need for a clear statement regarding the desirability of
transparent transfer of state together with a recommendation that transparent transfer of state together with a recommendation that
either that or a single-fs grace period be provided. either that or a single-fs grace period be provided.
o Specifically delineating how such transfers are to be dealt with o Specifically delineating how such transfers are to be dealt with
by the client, taking into account the differences from the by the client, taking into account the differences from the
treatment in [RFC7931] made necessary by the major protocol treatment in [RFC7931] made necessary by the major protocol
changes made in NFSv4.1. changes made in NFSv4.1.
o Discussion of the relationship between transparent state transfer o Discussion of the relationship between transparent state transfer
and Parallel NFS (pNFS). and Parallel NFS (pNFS).
o A clarification of the fs_locations_info attribute to specify
which portions of the information provided apply to a specific
network access path and which to the replica which that path is
used to access.
In addition, there are also updates to other sections of [RFC5661], In addition, there are also updates to other sections of [RFC5661],
where the consequences of the incorrect assumptions underlying the where the consequences of the incorrect assumptions underlying the
current treatment of multi-server namespace issues also need to be current treatment of multi-server namespace issues also need to be
corrected. These are to be dealt with as described in various corrected. These are to be dealt with as described in Sections 13
subsections of Section 12 of the current document. through 15 of the current document.
o A revised introductory section regarding multi-server namespace o A revised introductory section regarding multi-server namespace
facilities is provided. facilities is provided.
o A more realistic treatment of server scope is provided, which o A more realistic treatment of server scope is provided, which
reflects the more limited co-ordination of locking state adopted reflects the more limited co-ordination of locking state adopted
by servers actually sharing a common server scope. by servers actually sharing a common server scope.
o Some confusing text regarding changes in server_owner needs to be o Some confusing text regarding changes in server_owner needs to be
clarified. clarified.
o The description of NFS4ERR_MOVED needs to be updated since two o The description of NFS4ERR_MOVED needs to be updated since two
different network access paths to the same file system are no different network access paths to the same file system are no
longer considered to be two instances of the same file system. longer considered to be two instances of the same file system.
o A new treatment of EXCHANGE_ID is needed, replacing that which o A new treatment of EXCHANGE_ID is needed, replacing that which
appeared in Section 18.35 of [RFC5661] appeared in Section 18.35 of [RFC5661]. This is necessary since
the existing treatment of client id confirmation does not make
sense in the context of transparent state migration, in which
client ids are transferred between source and destination servers.
o A new treatment of RECLAIM_COMPLETE is needed, replacing that
which appeared in Section 18.51 of [RFC5661]. This is necessary
to clarify the function of the one-fs flag and clarify how
existing clients, that might not properly use this flag, are to be
dealt with.
3.3. Relationship of this Document to RFC5661 3.3. Relationship of this Document to RFC5661
The role of this document is to explain and specify a set of needed The role of this document is to explain and specify a set of needed
changes to [RFC5661]. All of these changes are related to the multi- changes to [RFC5661]. All of these changes are related to the multi-
server namespace features of NFSv4.1. server namespace features of NFSv4.1.
This document contains sections that propose additions to and other This document contains sections that propose additions to and other
modifications of [RFC5661] as well as others that explain the reasons modifications of [RFC5661] as well as others that explain the reasons
for modifications but do not directly affect existing specifications. for modifications but do not directly affect existing specifications.
skipping to change at page 9, line 25 skipping to change at page 10, line 14
o Editing sections contain some text that replaces text within o Editing sections contain some text that replaces text within
[RFC5661], although the entire section will not consist of such [RFC5661], although the entire section will not consist of such
text and will include other text as well. Such sections make text and will include other text as well. Such sections make
relatively minor adjustments in the existing NFSv4.1 specification relatively minor adjustments in the existing NFSv4.1 specification
which are expected to reflected in an eventual consolidated which are expected to reflected in an eventual consolidated
document. Generally such replacement text appears as a quotation, document. Generally such replacement text appears as a quotation,
which may take the form of an indented set of paragraphs. which may take the form of an indented set of paragraphs.
See Appendix A for a classification of the sections of this document See Appendix A for a classification of the sections of this document
according the categories above. according to the categories above.
When this document is approved and published, [RFC5661] would be When this document is approved and published, [RFC5661] would be
significantly updated with most of the changed sections within the significantly updated with most of the changed sections within the
current Section 11 of that document. A detailed discussion of the current Section 11 of that document. A detailed discussion of the
necessary updates can be found in Appendix B. necessary updates can be found in Appendix B.
4. Changes to Section 11 of RFC5661 4. Changes to Section 11 of RFC5661
A number of sections need to be revised, replacing existing sub- A number of sections need to be revised, replacing existing sub-
sections within section 11 of [RFC5661]: sections within section 11 of [RFC5661]:
skipping to change at page 10, line 11 skipping to change at page 10, line 47
New material relating to the handling of the location attributes New material relating to the handling of the location attributes
is contained in Sections 4.5.1 and 4.5.7 below. is contained in Sections 4.5.1 and 4.5.7 below.
o A major replacement for the existing Section 11.7 of [RFC5661] o A major replacement for the existing Section 11.7 of [RFC5661]
entitled "Effecting File System Transitions", will appear as entitled "Effecting File System Transitions", will appear as
Sections 6 through 11 of the current document. The reasons for Sections 6 through 11 of the current document. The reasons for
the reorganization of this section into multiple sections are the reorganization of this section into multiple sections are
discussed below in Section 5 of the current document. discussed below in Section 5 of the current document.
o A replacement for the existing Section 11.10 of [RFC5661] entitled
"The Attribute fs_locations_info", will appear as Section 12.2 of
the current document, with Section 12.1 describing the differences
between the new section and the treatment within [RFC5661]. A
revised treatment is necessary because the existing treatment did
not make clear how the added attribute information relates to the
case of trunked paths to the same replica. These issues were not
addressed in [RFC5661] where the concepts of a replica and a
network path used to access a replica were not clearly
distinguished.
4.1. Multi-Server Namespace (as updated) 4.1. Multi-Server Namespace (as updated)
NFSv4.1 supports attributes that allow a namespace to extend beyond NFSv4.1 supports attributes that allow a namespace to extend beyond
the boundaries of a single server. It is desirable that clients and the boundaries of a single server. It is desirable that clients and
servers support construction of such multi-server namespaces. Use of servers support construction of such multi-server namespaces. Use of
such multi-server namespaces is OPTIONAL however, and for many such multi-server namespaces is OPTIONAL however, and for many
purposes, single-server namespaces are perfectly acceptable. Use of purposes, single-server namespaces are perfectly acceptable. Use of
multi-server namespaces can provide many advantages, by separating a multi-server namespaces can provide many advantages, by separating a
file system's logical position in a namespace from the (possibly file system's logical position in a namespace from the (possibly
changing) logistical and administrative considerations that result in changing) logistical and administrative considerations that result in
skipping to change at page 10, line 39 skipping to change at page 11, line 37
by NFSv4 clients. Typically, this is done by assigning each file by NFSv4 clients. Typically, this is done by assigning each file
system a name within the pseudo-fs associated with the server, system a name within the pseudo-fs associated with the server,
although the pseudo-fs may be dispensed with if there is only a although the pseudo-fs may be dispensed with if there is only a
single exported file system. Each such file system is part of the single exported file system. Each such file system is part of the
server's local namespace, and can be considered as a file system server's local namespace, and can be considered as a file system
instance within a larger multi-server namespace. instance within a larger multi-server namespace.
o The set of all exported file systems for a given server o The set of all exported file systems for a given server
constitutes that server's local namespace. constitutes that server's local namespace.
o In some cases, a server will have a namespace, more extensive than o In some cases, a server will have a namespace more extensive than
its local namespace, by using features associated with attributes its local namespace, by using features associated with attributes
that provide location information. These features, which allow that provide location information. These features, which allow
construction of a multi-server namespace are all described in construction of a multi-server namespace are all described in
individual sections below and include referrals (described in individual sections below and include referrals (described in
Section 4.5.6), migration (described in Section 4.5.5), and Section 4.5.6), migration (described in Section 4.5.5), and
replication (described in Section 4.5.4). replication (described in Section 4.5.4).
o A file system present in a server's pseudo-fs may have multiple o A file system present in a server's pseudo-fs may have multiple
file system instances on different servers associated with it. file system instances on different servers associated with it.
All such instances are considered replicas of one another. All such instances are considered replicas of one another.
skipping to change at page 11, line 25 skipping to change at page 12, line 25
location attributes. Each such entry specifies a server, in the location attributes. Each such entry specifies a server, in the
form of a host name or IP address, and an fs name, which form of a host name or IP address, and an fs name, which
designates the location of the file system within the server's designates the location of the file system within the server's
pseudo-fs. A location entry designates a set of server endpoints pseudo-fs. A location entry designates a set of server endpoints
to which the client may establish connections. There may be to which the client may establish connections. There may be
multiple endpoints because a host name may map to multiple network multiple endpoints because a host name may map to multiple network
addresses and because multiple connection types may be used to addresses and because multiple connection types may be used to
communicate with a single network address. However, all such communicate with a single network address. However, all such
endpoints MUST provide a way of connecting to a single server. endpoints MUST provide a way of connecting to a single server.
The exact form of the location entry varies with the particular The exact form of the location entry varies with the particular
location attribute used as described in Section 4.3. location attribute used, as described in Section 4.3.
o Location elements are derived from location entries and each o Location elements are derived from location entries and each
describes a particular network access path, consisting of a describes a particular network access path, consisting of a
network address and a location within the server's pseudo-fs. network address and a location within the server's pseudo-fs.
Location elements need not appear within a location attribute, but Location elements need not appear within a location attribute, but
the existence of each location element derives from a the existence of each location element derives from a
corresponding location entry. When a location entry specifies an corresponding location entry. When a location entry specifies an
IP address there is only a single corresponding location element. IP address there is only a single corresponding location element.
Location entries that contain a host name, are resolved using DNS, Location entries that contain a host name, are resolved using DNS,
and may result in one or more location elements. All location and may result in one or more location elements. All location
skipping to change at page 12, line 26 skipping to change at page 13, line 26
namespace of one server can be associated with one or more instances namespace of one server can be associated with one or more instances
of that file system on other servers. These attributes contain of that file system on other servers. These attributes contain
location entries specifying a server address target (either as a DNS location entries specifying a server address target (either as a DNS
name representing one or more IP addresses or as a specific IP name representing one or more IP addresses or as a specific IP
address) together with the pathname of that file system within the address) together with the pathname of that file system within the
associated single-server namespace. associated single-server namespace.
The fs_locations_info RECOMMENDED attribute allows specification of The fs_locations_info RECOMMENDED attribute allows specification of
one or more file system instance locations where the data one or more file system instance locations where the data
corresponding to a given file system may be found. This attribute corresponding to a given file system may be found. This attribute
provides to the client, in to addition to specification of file provides to the client, in addition to specification of file system
system instance locations, other helpful information such as: instance locations, other helpful information such as:
o Information guiding choices among the various file system o Information guiding choices among the various file system
instances provided (e.g., priority for use, writability, currency, instances provided (e.g., priority for use, writability, currency,
etc.). etc.).
o Information to help the client efficiently effect as seamless a o Information to help the client efficiently effect as seamless a
transition as possible among multiple file system instances, when transition as possible among multiple file system instances, when
and if that should be necessary. and if that should be necessary.
o Information helping to guide the selection of the appropriate o Information helping to guide the selection of the appropriate
skipping to change at page 12, line 51 skipping to change at page 13, line 51
entry corresponds to a location entry with the fls_server field entry corresponds to a location entry with the fls_server field
designating the server, with the location pathname within the designating the server, with the location pathname within the
server's pseudo-fs given by the fl_rootpath field of the encompassing server's pseudo-fs given by the fl_rootpath field of the encompassing
fs_locations_item4. fs_locations_item4.
The fs_locations attribute defined in NFSv4.0 is also a part of The fs_locations attribute defined in NFSv4.0 is also a part of
NFSv4.1. This attribute only allows specification of the file system NFSv4.1. This attribute only allows specification of the file system
locations where the data corresponding to a given file system may be locations where the data corresponding to a given file system may be
found. Servers should make this attribute available whenever found. Servers should make this attribute available whenever
fs_locations_info is supported, but client use of fs_locations_info fs_locations_info is supported, but client use of fs_locations_info
is preferable. is preferable, as it provides more information.
Within the fs_location attribute, each fs_location4 contains a Within the fs_location attribute, each fs_location4 contains a
location entry with the server field designating the server and the location entry with the server field designating the server and the
rootpath field giving the location pathname within the server's rootpath field giving the location pathname within the server's
pseudo-fs. pseudo-fs.
4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661
Previously, issues related to the fact that multiple location entries Previously, issues related to the fact that multiple location entries
directed the client to the same file system instance were dealt with directed the client to the same file system instance were dealt with
skipping to change at page 14, line 19 skipping to change at page 15, line 19
server can be associated with a namespace defined by another server, server can be associated with a namespace defined by another server,
thus allowing a general multi-server namespace facility. A thus allowing a general multi-server namespace facility. A
designation of such a remote instance, in place of a file system designation of such a remote instance, in place of a file system
never previously present , is called a "pure referral" and is never previously present , is called a "pure referral" and is
discussed in Section 4.5.6 below. discussed in Section 4.5.6 below.
Because client support for location-related attributes is OPTIONAL, a Because client support for location-related attributes is OPTIONAL, a
server may (but is not required to) take action to hide migration and server may (but is not required to) take action to hide migration and
referral events from such clients, by acting as a proxy, for example. referral events from such clients, by acting as a proxy, for example.
The server can determine the presence of client support from the The server can determine the presence of client support from the
arguments of the EXCHANGE_ID operation (see Section 13.3 in the arguments of the EXCHANGE_ID operation (see Section 14.3 in the
current document). current document).
4.5.1. Combining Multiple Uses in a Single Attribute (to be added) 4.5.1. Combining Multiple Uses in a Single Attribute (to be added)
A location attribute will sometimes contain information relating to A location attribute will sometimes contain information relating to
the location of multiple replicas which may be used in different the location of multiple replicas which may be used in different
ways. ways.
o Location entries that relate to the file system instance currently o Location entries that relate to the file system instance currently
in use provide trunking information, allowing the client to find in use provide trunking information, allowing the client to find
skipping to change at page 15, line 38 skipping to change at page 16, line 38
network addresses. It might use the latter form because of DNS- network addresses. It might use the latter form because of DNS-
related security concerns or because the set of addresses to be used related security concerns or because the set of addresses to be used
might require active management by the server. might require active management by the server.
Locations entries used to discover candidate addresses for use in Locations entries used to discover candidate addresses for use in
trunking are subject to change, as discussed in Section 4.5.7 below. trunking are subject to change, as discussed in Section 4.5.7 below.
The client may respond to such changes by using additional addresses The client may respond to such changes by using additional addresses
once they are verified or by ceasing to use existing ones. The once they are verified or by ceasing to use existing ones. The
server can force the client to cease using an address by returning server can force the client to cease using an address by returning
NFS4ERR_MOVED when that address is used to access a file system. NFS4ERR_MOVED when that address is used to access a file system.
This allows a transfer of access similar to migration, although the This allows a transfer of client access which is similar to
same file system instance is accessed throughout. migration, although the same file system instance is accessed
throughout.
4.5.3. Location Attributes and Connection Type Selection (to be added) 4.5.3. Location Attributes and Connection Type Selection (to be added)
Because of the need to support multiple connections, clients face the Because of the need to support multiple connections, clients face the
issue of determining the proper connection type to use when issue of determining the proper connection type to use when
establishing a connection to a given server network address. In some establishing a connection to a given server network address. In some
cases, this issue can be addressed through the use of the connection cases, this issue can be addressed through the use of the connection
"step-up" facility described in Section 18.16 of [RFC5661]. However, "step-up" facility described in Section 18.16 of [RFC5661]. However,
because there are cases is which that fcility is not available, the because there are cases is which that facility is not available, the
client may have to choose a connection type with no possibility of client may have to choose a connection type with no possibility of
changing it within the scope of a single connection. changing it within the scope of a single connection.
The two location attributes differ as to the information made The two location attributes differ as to the information made
available in this regard. Fs_locations provides no information to available in this regard. Fs_locations provides no information to
support connection type selection. As a result, clients supporting support connection type selection. As a result, clients supporting
multiple connection types need to attempt to establish a connection multiple connection types would need to attempt to establish
on multiple connection types until the one preferred by the client is connections using multiple connection types until the one preferred
successfully established. by the client is successfully established.
Fs_locations_info provides a flag, FSLI4TF_RDMA flag. indicating Fs_locations_info provides a flag, FSLI4TF_RDMA flag. indicating
that RPC-over-RDMA support is available using the specfied location that RPC-over-RDMA support is available using the specified location
entry. This flag makes it for a convenient for a client wishing to entry. This flag makes it for a convenient for a client wishing to
use RDMA, to establish a TCP connection and then convert to use of use RDMA, to establish a TCP connection and then convert to use of
RDMA. After establishing a TCP connection, the step-up facility, can RDMA. After establishing a TCP connection, the step-up facility, can
be used, if available, to convert that connection to RDMA mode. be used, if available, to convert that connection to RDMA mode.
Otherwise, if RDMA availability is indicated, a new RDMA connection Otherwise, if RDMA availability is indicated, a new RDMA connection
can be established and it can be bound to the sessiion already can be established and it can be bound to the session already
established by the TCP connection, allowing the TCP connection to be established by the TCP connection, allowing the TCP connection to be
dropped and the session converted to further use in RDMA node. dropped and the session converted to further use in RDMA node.
4.5.4. File System Replication (as updated) 4.5.4. File System Replication (as updated)
The fs_locations and fs_locations_info attributes provide alternative The fs_locations and fs_locations_info attributes provide alternative
locations, to be used to access data in place of or in addition to locations, to be used to access data in place of or in addition to
the current file system instance. On first access to a file system, the current file system instance. On first access to a file system,
the client should obtain the set of alternate locations by the client should obtain the set of alternate locations by
interrogating the fs_locations or fs_locations_info attribute, with interrogating the fs_locations or fs_locations_info attribute, with
skipping to change at page 17, line 4 skipping to change at page 18, line 4
fs_locations and fs_locations_info attributes and how the client fs_locations and fs_locations_info attributes and how the client
deals with file system transition issues will be discussed in detail deals with file system transition issues will be discussed in detail
below. below.
4.5.5. File System Migration (as updated) 4.5.5. File System Migration (as updated)
When a file system is present and becomes absent, clients can be When a file system is present and becomes absent, clients can be
given the opportunity to have continued access to their data, at an given the opportunity to have continued access to their data, at an
alternate location, as specified by a location attribute. This alternate location, as specified by a location attribute. This
migration of access to another replica includes the ability to retain migration of access to another replica includes the ability to retain
locks across the transition, either by reclaim or by Transparent locks across the transition, either by using lock reclaim or by
State Migration. taking advantage of Transparent State Migration.
Typically, a client will be accessing the file system in question, Typically, a client will be accessing the file system in question,
get an NFS4ERR_MOVED error, and then use a location attribute to get an NFS4ERR_MOVED error, and then use a location attribute to
determine the new location of the data. When fs_locations_info is determine the new location of the data. When fs_locations_info is
used, additional information will be available that will define the used, additional information will be available that will define the
nature of the client's handling of the transition to a new server. nature of the client's handling of the transition to a new server.
Such migration can be helpful in providing load balancing or general Such migration can be helpful in providing load balancing or general
resource reallocation. The protocol does not specify how the file resource reallocation. The protocol does not specify how the file
system will be moved between servers. It is anticipated that a system will be moved between servers. It is anticipated that a
skipping to change at page 17, line 30 skipping to change at page 18, line 30
The new location may be, in the case of various forms of server The new location may be, in the case of various forms of server
clustering, another server providing access to the same physical file clustering, another server providing access to the same physical file
system. The client's responsibilities in dealing with this system. The client's responsibilities in dealing with this
transition will depend on whether migration has occurred and the transition will depend on whether migration has occurred and the
means the server has chosen to provide continuity of locking state. means the server has chosen to provide continuity of locking state.
These issues will be discussed in detail below. These issues will be discussed in detail below.
Although a single successor location is typical, multiple locations Although a single successor location is typical, multiple locations
may be provided. When multiple locations are provided, the client may be provided. When multiple locations are provided, the client
use the first one provided. If that is inaccessible for some reason, will typically use the first one provided. If that is inaccessible
later ones can be used. In such cases the client might consider that for some reason, later ones can be used. In such cases the client
the transition to the new replica is a migration event, although it might consider that the transition to the new replica as a migration
would lose access to locking state if it did so. event, even though some of the servers involved might not be aware of
the use of the server which was inaccessible. In such a case, a
client might lose access to locking state as a result of the access
transfer.
When an alternate location is designated as the target for migration, When an alternate location is designated as the target for migration,
it must designate the same data (with metadata being the same to the it must designate the same data (with metadata being the same to the
degree indicated by the fs_locations_info attribute). Where file degree indicated by the fs_locations_info attribute). Where file
systems are writable, a change made on the original file system must systems are writable, a change made on the original file system must
be visible on all migration targets. Where a file system is not be visible on all migration targets. Where a file system is not
writable but represents a read-only copy (possibly periodically writable but represents a read-only copy (possibly periodically
updated) of a writable file system, similar requirements apply to the updated) of a writable file system, similar requirements apply to the
propagation of updates. Any change visible in the original file propagation of updates. Any change visible in the original file
system must already be effected on all migration targets, to avoid system must already be effected on all migration targets, to avoid
any possibility that a client, in effecting a transition to the any possibility that a client, in effecting a transition to the
migration target, will see any reversion in file system state. migration target, will see any reversion in file system state.
4.5.6. Referrals (as updated) 4.5.6. Referrals (as updated)
Referrals allow the server to associate a file system located on one Referrals allow the server to associate a file system namespace entry
server with file system located on another server. When this located on one server with a file system located on another server.
includes the use of pure referrals, servers are provided a way of When this includes the use of pure referrals, servers are provided a
placing a file system in a location within the namespace essentially way of placing a file system in a location within the namespace
without respect to its physical location on a particular server. essentially without respect to its physical location on a particular
This allows a single server or a set of servers to present a multi- server. This allows a single server or a set of servers to present a
server namespace that encompasses file systems located on a wider multi-server namespace that encompasses file systems located on a
range of servers. Some likely uses of this facility include wider range of servers. Some likely uses of this facility include
establishment of site-wide or organization-wide namespaces, with the establishment of site-wide or organization-wide namespaces, with the
eventual possibility of combining such together into a truly global eventual possibility of combining such together into a truly global
namespace. namespace.
Referrals occur when a client determines, upon first referencing a Referrals occur when a client determines, upon first referencing a
position in the current namespace, that it is part of a new file position in the current namespace, that it is part of a new file
system and that the file system is absent. When this occurs, system and that the file system is absent. When this occurs,
typically by receiving the error NFS4ERR_MOVED, the actual location typically upon receiving the error NFS4ERR_MOVED, the actual location
or locations of the file system can be determined by fetching the or locations of the file system can be determined by fetching the a
fs_locations or fs_locations_info attribute. locations attribute. attribute.
The locations-related attribute may designate a single file system The locations attribute may designate a single file system location
location or multiple file system locations, to be selected based on or multiple file system locations, to be selected based on the needs
the needs of the client. The server, in the fs_locations_info of the client. The server, in the fs_locations_info attribute, may
attribute, may specify priorities to be associated with various file specify priorities to be associated with various file system location
system location choices. The server may assign different priorities choices. The server may assign different priorities to different
to different locations as reported to individual clients, in order to locations as reported to individual clients, in order to adapt to
adapt to client physical location or to effect load balancing. When client physical location or to effect load balancing. When both
both read-only and read-write file systems are present, some of the read-only and read-write file systems are present, some of the read-
read-only locations might not be absolutely up-to-date (as they would only locations might not be absolutely up-to-date (as they would have
have to be in the case of replication and migration). Servers may to be in the case of replication and migration). Servers may also
also specify file system locations that include client-substituted specify file system locations that include client-substituted
variables so that different clients are referred to different file variables so that different clients are referred to different file
systems (with different data contents) based on client attributes systems (with different data contents) based on client attributes
such as CPU architecture. such as CPU architecture.
When the fs_locations_info attribute is such that that there are When the fs_locations_info attribute is such that that there are
multiple possible targets listed, the relationships among them may be multiple possible targets listed, the relationships among them may be
important to the client in selecting which one to use. The same important to the client in selecting which one to use. The same
rules specified in Section 4.5.5 below regarding multiple migration rules specified in Section 4.5.5 below regarding multiple migration
targets apply to these multiple replicas as well. For example, the targets apply to these multiple replicas as well. For example, the
client might prefer a writable target on a server that has additional client might prefer a writable target on a server that has additional
skipping to change at page 19, line 16 skipping to change at page 20, line 21
providing a large set of pure referrals to all of the included file providing a large set of pure referrals to all of the included file
systems. Alternatively, a single multi-server namespace may be systems. Alternatively, a single multi-server namespace may be
administratively segmented with separate referral file systems (on administratively segmented with separate referral file systems (on
separate servers) for each separately administered portion of the separate servers) for each separately administered portion of the
namespace. The top-level referral file system or any segment may use namespace. The top-level referral file system or any segment may use
replicated referral file systems for higher availability. replicated referral file systems for higher availability.
Generally, multi-server namespaces are for the most part uniform, in Generally, multi-server namespaces are for the most part uniform, in
that the same data made available to one client at a given location that the same data made available to one client at a given location
in the namespace is made available to all clients at that location. in the namespace is made available to all clients at that location.
However, there are facilities provided that allow different clients However, as described above, there are facilities provided that allow
to be directed different sets of data, to enable adaptation to such different clients to be directed different sets of data, to enable
client characteristics as CPU architecture. adaptation to such client characteristics as CPU architecture.
4.5.7. Changes in a Location Attribute (to be added) 4.5.7. Changes in a Location Attribute (to be added)
Although clients will typically fetch a location attribute when first Although clients will typically fetch a location attribute when first
accessing a file system and when NFS4ERR_MOVED is returned, a client accessing a file system and when NFS4ERR_MOVED is returned, a client
can choose to fetch the attribute periodically, in which case, the can choose to fetch the attribute periodically, in which case the
value fetched may change over time. value fetched may change over time.
For clients not prepared to access multiple replicas simultaneously For clients not prepared to access multiple replicas simultaneously
(see Section 8.1 of the current document), the handling of the (see Section 8.1 of the current document), the handling of the
various cases of change are as follows: various cases of change is as follows:
o Changes in the list of replicas or in the network addresses o Changes in the list of replicas or in the network addresses
associated with replicas do not require immediate action. The associated with replicas do not require immediate action. The
client will typically update its list of replicas to reflect the client will typically update its list of replicas to reflect the
new information. new information.
o Additions to the list of network addresses for the current file o Additions to the list of network addresses for the current file
system instance need not be acted on promptly. However the client system instance need not be acted on promptly. However the client
can choose to use the new address whenever it needs to switch can choose to use the new address whenever it needs to switch
access to a new replica. access to a new replica.
skipping to change at page 20, line 12 skipping to change at page 21, line 19
adjusting its access even in the absence of difficulties that would adjusting its access even in the absence of difficulties that would
lead to a new replica to be selected. lead to a new replica to be selected.
o When a new replica is added which may be accessed simultaneously o When a new replica is added which may be accessed simultaneously
with one currently in use, the client is free to use the new with one currently in use, the client is free to use the new
replica immediately. replica immediately.
o When a replica currently in use is deleted from the list, the o When a replica currently in use is deleted from the list, the
client need not cease using it immediately. However, since the client need not cease using it immediately. However, since the
server may subsequently force such use to cease (by returning server may subsequently force such use to cease (by returning
NFS4ERR_MOVED), clients can decide to limit the need for later NFS4ERR_MOVED), clients might decide to limit the need for later
state transfer. For example, new opens might be done on other state transfer. For example, new opens might be done on other
replicas, rather than on one not present in the list. replicas, rather than on one not present in the list.
5. Re-organization of Section 11.7 of RFC5661 5. Re-organization of Section 11.7 of RFC5661
The material in Section 11.7 of [RFC5661] has been reorganized and The material in Section 11.7 of [RFC5661] has been reorganized and
augmented as specified below: augmented as specified below:
o Because there can be a shift of the network access paths used to o Because there can be a shift of the network access paths used to
access a file system instance without any shift between replicas, access a file system instance without any shift between replicas,
skipping to change at page 22, line 11 skipping to change at page 23, line 19
o When there is no potential replacement address in use and there o When there is no potential replacement address in use and there
are no valid addresses session-trunkable with the one whose use is are no valid addresses session-trunkable with the one whose use is
to be discontinued, other server-trunkable addresses may be used to be discontinued, other server-trunkable addresses may be used
to provide continued access. Although use of CREATE_SESSION is to provide continued access. Although use of CREATE_SESSION is
available to provide continued access to the existing instance, available to provide continued access to the existing instance,
servers have the option of providing continued access to the servers have the option of providing continued access to the
existing session through the new network access path in a fashion existing session through the new network access path in a fashion
similar to that provided by session migration (see Section 9 of similar to that provided by session migration (see Section 9 of
the current document). To take advantage of this possibility, the current document). To take advantage of this possibility,
clients can perform an initial BIND_CONN_TO_SESSION, as in the clients can perform an initial BIND_CONN_TO_SESSION, as in the
previous case, and use CREATE_SESSION only when that fails. previous case, and use CREATE_SESSION only if that fails.
8. Effecting File System Transitions (as updated) 8. Effecting File System Transitions (as updated)
There are a range of situations in which there is a change to be There are a range of situations in which there is a change to be
effected in the set of replicas used to access a particular file effected in the set of replicas used to access a particular file
system. Some of these may involve an expansion or contraction of the system. Some of these may involve an expansion or contraction of the
set of replicas used as discussed in Section 8.1 below. set of replicas used as discussed in Section 8.1 below.
For reasons explained in that section, most transitions will involve For reasons explained in that section, most transitions will involve
a transition from a single replica to a corresponding replacement a transition from a single replica to a corresponding replacement
skipping to change at page 23, line 8 skipping to change at page 24, line 17
effective continuity of locking state are discussed in Section 10 effective continuity of locking state are discussed in Section 10
of the current document. of the current document.
o The servers' (source and destination) responsibilities in o The servers' (source and destination) responsibilities in
effecting Transparent Migration of locking and session state are effecting Transparent Migration of locking and session state are
discussed in Section 11 of the current document. discussed in Section 11 of the current document.
8.1. File System Transitions and Simultaneous Access (as updated) 8.1. File System Transitions and Simultaneous Access (as updated)
The fs_locations_info attribute (described in Section 11.10.1 of The fs_locations_info attribute (described in Section 11.10.1 of
[RFC5661]) may indicate that two replicas may be used simultaneously [RFC5661] and Section 12.2 of this document) may indicate that two
(see Section 11.7.2.1 of [RFC5661] for details). Although situations replicas may be used simultaneously (see Section 11.7.2.1 of
in which multiple replicas may be accessed simultaneously are [RFC5661] for details). Although situations in which multiple
somewhat similar to those in which a single replica is accessed by replicas may be accessed simultaneously are somewhat similar to those
multiple network addresses, there are important differences, since in which a single replica is accessed by multiple network addresses,
locking state is not shared among multiple replicas. there are important differences, since locking state is not shared
among multiple replicas.
Because of this difference in state handling, many clients will not Because of this difference in state handling, many clients will not
have the ability to take advantage of the fact that such replicas have the ability to take advantage of the fact that such replicas
represent the same data. Such clients will not be prepared to use represent the same data. Such clients will not be prepared to use
multiple replicas simultaneously but will access each file system multiple replicas simultaneously but will access each file system
using only a single replica, although the replica selected may make using only a single replica, although the replica selected might make
multiple server-trunkable addresses available. multiple server-trunkable addresses available.
Clients who are prepared to use multiple replicas simultaneously will Clients who are prepared to use multiple replicas simultaneously will
divide opens among replicas however they choose. Once that choice is divide opens among replicas however they choose. Once that choice is
made, any subsequent transitions will treat the set of locking state made, any subsequent transitions will treat the set of locking state
associated with each replica as a single entity. associated with each replica as a single entity.
For example, if one of the replicas become unavailable, access will For example, if one of the replicas become unavailable, access will
be transferred to a different replica, also capable of simultaneous be transferred to a different replica, also capable of simultaneous
access with the one still in use. access with the one still in use.
skipping to change at page 25, line 38 skipping to change at page 26, line 49
possible. possible.
Although normally a single source file system will transition to a Although normally a single source file system will transition to a
single target file system, there is a provision for splitting a single target file system, there is a provision for splitting a
single source file system into multiple target file systems, by single source file system into multiple target file systems, by
specifying the FSLI4F_MULTI_FS flag. specifying the FSLI4F_MULTI_FS flag.
8.4.1. File System Splitting (as updated) 8.4.1. File System Splitting (as updated)
When a file system transition is made and the fs_locations_info When a file system transition is made and the fs_locations_info
indicates that the file system in question may be split into multiple indicates that the file system in question might be split into
file systems (via the FSLI4F_MULTI_FS flag), the client SHOULD do multiple file systems (via the FSLI4F_MULTI_FS flag), the client
GETATTRs to determine the fsid attribute on all known objects within SHOULD do GETATTRs to determine the fsid attribute on all known
the file system undergoing transition to determine the new file objects within the file system undergoing transition to determine the
system boundaries. new file system boundaries.
Clients may maintain the fsids passed to existing applications by Clients might choose to maintain the fsids passed to existing
mapping all of the fsids for the descendant file systems to the applications by mapping all of the fsids for the descendant file
common fsid used for the original file system. systems to the common fsid used for the original file system.
Splitting a file system may be done on a transition between file Splitting a file system can be done on a transition between file
systems of the same fileid class, since the fact that fileids are systems of the same fileid class, since the fact that fileids are
unique within the source file system ensure they will be unique in unique within the source file system ensure they will be unique in
each of the target file systems. each of the target file systems.
8.5. The Change Attribute and File System Transitions (as updated) 8.5. The Change Attribute and File System Transitions (as updated)
Since the change attribute is defined as a server-specific one, Since the change attribute is defined as a server-specific one,
change attributes fetched from one server are normally presumed to be change attributes fetched from one server are normally presumed to be
invalid on another server. Such a presumption is troublesome since invalid on another server. Such a presumption is troublesome since
it would invalidate all cached change attributes, requiring it would invalidate all cached change attributes, requiring
skipping to change at page 26, line 27 skipping to change at page 27, line 38
happening to result in an identical change value. happening to result in an identical change value.
When the two file systems have consistent change attribute formats, When the two file systems have consistent change attribute formats,
and this fact is communicated to the client by reporting in the same and this fact is communicated to the client by reporting in the same
change class, the client may assume a continuity of change attribute change class, the client may assume a continuity of change attribute
construction and handle this situation just as it would be handled construction and handle this situation just as it would be handled
without any file system transition. without any file system transition.
8.6. Write Verifiers and File System Transitions (as updated) 8.6. Write Verifiers and File System Transitions (as updated)
In a file system transition, the two file systems may be clustered in In a file system transition, the two file systems might be clustered
the handling of unstably written data. When this is the case, and in the handling of unstably written data. When this is the case, and
the two file systems belong to the same write-verifier class, write the two file systems belong to the same write-verifier class, write
verifiers returned from one system may be compared to those returned verifiers returned from one system may be compared to those returned
by the other and superfluous writes avoided. by the other and superfluous writes avoided.
When two file systems belong to different write-verifier classes, any When two file systems belong to different write-verifier classes, any
verifier generated by one must not be compared to one provided by the verifier generated by one must not be compared to one provided by the
other. Instead, the two verifiers should be treated as not equal other. Instead, the two verifiers should be treated as not equal
even when the values are identical. even when the values are identical.
8.7. Readdir Cookies and Verifiers and File System Transitions (as 8.7. Readdir Cookies and Verifiers and File System Transitions (as
updated) updated)
In a file system transition, the two file systems may be consistent In a file system transition, the two file systems might be consistent
in their handling of READDIR cookies and verifiers. When this is the in their handling of READDIR cookies and verifiers. When this is the
case, and the two file systems belong to the same readdir class, case, and the two file systems belong to the same readdir class,
READDIR cookies and verifiers from one system may be recognized by READDIR cookies and verifiers from one system may be recognized by
the other and READDIR operations started on one server may be validly the other and READDIR operations started on one server may be validly
continued on the other, simply by presenting the cookie and verifier continued on the other, simply by presenting the cookie and verifier
returned by a READDIR operation done on the first file system to the returned by a READDIR operation done on the first file system to the
second. second.
When two file systems belong to different readdir classes, any When two file systems belong to different readdir classes, any
READDIR cookie and verifier generated by one is not valid on the READDIR cookie and verifier generated by one is not valid on the
skipping to change at page 28, line 33 skipping to change at page 29, line 47
8.9. Lock State and File System Transitions (as updated) 8.9. Lock State and File System Transitions (as updated)
While accessing a file system, clients obtain locks enforced by the While accessing a file system, clients obtain locks enforced by the
server which may prevent actions by other clients that are server which may prevent actions by other clients that are
inconsistent with those locks. inconsistent with those locks.
When access is transferred between replicas, clients need to be When access is transferred between replicas, clients need to be
assured that the actions disallowed by holding these locks cannot assured that the actions disallowed by holding these locks cannot
have occurred during the transition. This can be ensured by the have occurred during the transition. This can be ensured by the
methods below. If at least one of these is not implemented, clients methods below. Unless at least one of these is implemented, clients
will not be assured of continuity of lock possession across a will not be assured of continuity of lock possession across a
migration event. migration event.
o Providing the client an opportunity to re-obtain his locks via a o Providing the client an opportunity to re-obtain his locks via a
per-fs grace period on the destination server. Because the lock per-fs grace period on the destination server. Because the lock
reclaim mechanism was originally defined to support server reboot, reclaim mechanism was originally defined to support server reboot,
it implicitly assumes that file handles will on reclaim will be it implicitly assumes that file handles will on reclaim will be
the same as those at open. In the case of migration this requires the same as those at open. In the case of migration, this
that source and destination servers use the same filehandles, as requires that source and destination servers use the same
evidenced by using the same server scope (see Section 12.2 of the filehandles, as evidenced by using the same server scope (see
current document) or by showing this agreement using Section 13.2 of the current document) or by showing this agreement
fs_locations_info (see Section 8.2 above). using fs_locations_info (see Section 8.2 above).
o Transferring locking state as part of the transition as described o Locking state can be transferred as part of the transition by
in Section 9 of the current document to provide Transparent State providing Transparent State Migration as described in Section 9 of
Migration. the current document.
Of these, Transparent State Migration provides the smoother Of these, Transparent State Migration provides the smoother
experience for clients in that there is no grace-period-based delay experience for clients in that there is no grace-period-based delay
before new locks can be obtained. However, it requires a greater before new locks can be obtained. However, it requires a greater
degree of inter-server co-ordination. In general, the servers taking degree of inter-server co-ordination. In general, the servers taking
part in migration are free to provide either facility. However, when part in migration are free to provide either facility. However, when
the filehandles can differ across the migration event, Transparent the filehandles can differ across the migration event, Transparent
State Migration is the only available means of providing the needed State Migration is the only available means of providing the needed
functionality. functionality.
It should be noted that these two methods are not mutually exclusive It should be noted that these two methods are not mutually exclusive
and that a server might well provide both. In particular, if there and that a server might well provide both. In particular, if there
is some circumstance preventing a specific lock from being is some circumstance preventing a specific lock from being
transferred transparently, the server can allow it to be reclaimed. transferred transparently, the destination server can allow it to be
reclaimed, by implementing a per-fs grace period for the migrated
file system.
9. Transferring State upon Migration (to be added) 9. Transferring State upon Migration (to be added)
When the transition is a result of a server-initiated decision to When the transition is a result of a server-initiated decision to
transition access and the source and destination servers have transition access and the source and destination servers have
implemented appropriate co-operation, it is possible to: implemented appropriate co-operation, it is possible to:
o Transfer locking state from the source to the destination server, o Transfer locking state from the source to the destination server,
in a fashion similar to that provide by Transparent State in a fashion similar to that provided by Transparent State
Migration in NFSv4.0, as described in [RFC7931]. Server Migration in NFSv4.0, as described in [RFC7931]. Server
responsibilities are described in Section 11.1 of the current responsibilities are described in Section 11.2 of the current
document. document.
o Transfer session state from the source to the destination server. o Transfer session state from the source to the destination server.
Server responsibilities in effecting such a transfer are described Server responsibilities in effecting such a transfer are described
in Section 11.2 of the current document. in Section 11.3 of the current document.
The means by which the client determines which of these transfer The means by which the client determines which of these transfer
events has occurred are described in Section 10 of the current events has occurred are described in Section 10 of the current
document. document.
9.1. Transparent State Migration and pNFS (to be added) 9.1. Transparent State Migration and pNFS (to be added)
When pNFS is involved, the protocol is capable of supporting: When pNFS is involved, the protocol is capable of supporting:
o Migration of the Metadata Server (MDS), leaving the Data Servers o Migration of the Metadata Server (MDS), leaving the Data Servers
skipping to change at page 30, line 42 skipping to change at page 32, line 11
Migration may transfer a file system from a server which does not Migration may transfer a file system from a server which does not
support pNFS to one which does. In order to properly adapt to this support pNFS to one which does. In order to properly adapt to this
situation, clients which support pNFS, but function adequately in its situation, clients which support pNFS, but function adequately in its
absence should check for pNFS support when a file system is migrated absence should check for pNFS support when a file system is migrated
and be prepared to use pNFS when support is available on the and be prepared to use pNFS when support is available on the
destination. destination.
10. Client Responsibilities when Access is Transitioned (to be added) 10. Client Responsibilities when Access is Transitioned (to be added)
For a client to respond to an access transition, it must be made For a client to respond to an access transition, it must become aware
aware of it. The ways in which this can happen are discussed in of it. The ways in which this can happen are discussed in
Section 10.1 which discusses indications that a specific file system Section 10.1 which discusses indications that a specific file system
access path has transitioned as well as situations in which access path has transitioned as well as situations in which
additional activity is necessary to determine the set of file systems additional activity is necessary to determine the set of file systems
that have been migrated. Section 10.2 goes on to complete the that have been migrated. Section 10.2 goes on to complete the
discussion of how the set of migrated file systems might be discussion of how the set of migrated file systems might be
determined. Sections 10.3 through 10.5 discuss how the client should determined. Sections 10.3 through 10.5 discuss how the client should
deal with each transition it becomes aware of, either directly or as deal with each transition it becomes aware of, either directly or as
a result of migration discovery. a result of migration discovery.
The following terms are used to describe client activities: The following terms are used to describe client activities:
skipping to change at page 32, line 15 skipping to change at page 33, line 35
to respond by using the location information to access the file to respond by using the location information to access the file
system at its new location to ensure that leases are not system at its new location to ensure that leases are not
needlessly expired. needlessly expired.
Unlike the case of NFSv4.0, in which the corresponding conditions are Unlike the case of NFSv4.0, in which the corresponding conditions are
both errors and thus mutually exclusive, in NFSv4.1 the client can, both errors and thus mutually exclusive, in NFSv4.1 the client can,
and often will, receive both indications on the same request. As a and often will, receive both indications on the same request. As a
result, implementations need to address the question of how to co- result, implementations need to address the question of how to co-
ordinate the necessary recovery actions when both indications arrive ordinate the necessary recovery actions when both indications arrive
in the response to the same request. It should be noted that when in the response to the same request. It should be noted that when
processing an NFSv4 COMPOUND, the server decides whether processing an NFSv4 COMPOUND, the server will normally decide whether
SEQ4_STATUS_LEASE_MOVED is to be set before it determines which file SEQ4_STATUS_LEASE_MOVED is to be set before it determines which file
system will be referenced or whether NFS4ERR_MOVED is to be returned. system will be referenced or whether NFS4ERR_MOVED is to be returned.
Since these indications are not mutually exclusive in NFSv4.1, the Since these indications are not mutually exclusive in NFSv4.1, the
following combinations are possible results when a COMPOUND is following combinations are possible results when a COMPOUND is
issued: issued:
o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED
is asserted. is asserted.
skipping to change at page 33, line 16 skipping to change at page 34, line 33
file system(s) accessed by the request. However, to prevent file system(s) accessed by the request. However, to prevent
avoidable lease expiration, migration discovery needs to be done avoidable lease expiration, migration discovery needs to be done
o The COMPOUND status is not NFS4ERR_MOVED and o The COMPOUND status is not NFS4ERR_MOVED and
SEQ4_STATUS_LEASE_MOVED is clear. SEQ4_STATUS_LEASE_MOVED is clear.
In this case, neither transition-related activity nor migration In this case, neither transition-related activity nor migration
discovery is required. discovery is required.
Note that the specified actions only need to be taken if they are not Note that the specified actions only need to be taken if they are not
already going on. For example NFS4ERR_MOVED on a file system for already going on. For example, when NFS4ERR_MOVED is received when
which transition recovery already going on merely waits for that accessing a file system for which transition recovery already going
recovery to be completed while SEQ4_STATUS_LEASE_MOVED only needs to on, the client merely waits for that recovery to be completed while
the receipt of SEQ4_STATUS_LEASE_MOVED indication only needs to
initiate migration discovery for a server if it is not going on for initiate migration discovery for a server if it is not going on for
that server. that server.
The fact that a lease-migrated condition does not result in an error The fact that a lease-migrated condition does not result in an error
in NFSv4.1 has a number of important consequences. In addition to in NFSv4.1 has a number of important consequences. In addition to
the fact, discussed above, that the two indications are not mutually the fact, discussed above, that the two indications are not mutually
exclusive, there are number of issues that are important in exclusive, there are number of issues that are important in
considering implementation of migration discovery, as discussed in considering implementation of migration discovery, as discussed in
Section 10.2. Section 10.2.
skipping to change at page 34, line 26 skipping to change at page 35, line 43
o For such indications received in all other contexts, the o For such indications received in all other contexts, the
appropriate response is to initiate or otherwise provide for the appropriate response is to initiate or otherwise provide for the
execution of migration discovery for file systems associated with execution of migration discovery for file systems associated with
the server IP address returning the indication. the server IP address returning the indication.
This leaves a potential difficulty in situations in which the This leaves a potential difficulty in situations in which the
migration discovery process is near to completion but is still migration discovery process is near to completion but is still
operating. One should not ignore a LEASE_MOVED indication if the operating. One should not ignore a LEASE_MOVED indication if the
migration discovery process is not able to respond to the discovery migration discovery process is not able to respond to the discovery
of additional migrating file system without additional aid. A of additional migrating file systems without additional aid. A
further complexity relevant in addressing such situations is that a further complexity relevant in addressing such situations is that a
lease-migrated indication may reflect the server's state at the time lease-migrated indication may reflect the server's state at the time
the SEQUENCE operation was processed, which may be different from the SEQUENCE operation was processed, which may be different from
that in effect at the time the response is received. Because new that in effect at the time the response is received. Because new
migration events may occur at any time, and because a LEASE_MOVED migration events may occur at any time, and because a LEASE_MOVED
indication may reflect the situation in effect a considerable time indication may reflect the situation in effect a considerable time
before the indication is received, special care needs to be taken to before the indication is received, special care needs to be taken to
ensure that LEASE_MOVED indications are not inappropriately ignored. ensure that LEASE_MOVED indications are not inappropriately ignored.
A useful approach to this issue involves the use of separate A useful approach to this issue involves the use of separate
skipping to change at page 35, line 22 skipping to change at page 36, line 39
STATUS4_REFERRAL) and thus that it is likely that the fetch of the STATUS4_REFERRAL) and thus that it is likely that the fetch of the
location attribute has cleared one the file systems contributing location attribute has cleared one the file systems contributing
to the lease-migrated indication. to the lease-migrated indication.
o In cases in which that happened, the thread cannot know whether o In cases in which that happened, the thread cannot know whether
the lease-migrated indication has been cleared and so it enters the lease-migrated indication has been cleared and so it enters
the completion/verification state and proceeds to issue a COMPOUND the completion/verification state and proceeds to issue a COMPOUND
to see if the LEASE_MOVED indication has been cleared. to see if the LEASE_MOVED indication has been cleared.
o When the discovery process is in the completion/verification o When the discovery process is in the completion/verification
state, if others get a lease-migrated indication they note the it state, if others request get a lease-migrated indication they note
was received and the existence of such indications is used when that it was received and the existence of such indications is used
the request completes, as described below. when the request completes, as described below.
When the request used in the completion/verification state completes: When the request used in the completion/verification state completes:
o If a lease-migrated indication is returned, the discovery o If a lease-migrated indication is returned, the discovery
continues normally. Note that this is so even if all file systems continues normally. Note that this is so even if all file systems
have traversed, since new migrations could have occurred while the have traversed, since new migrations could have occurred while the
process was going on. process was going on.
o Otherwise, if there is any record that other requests saw a lease- o Otherwise, if there is any record that other requests saw a lease-
migrated indication, that record is cleared and the verification migrated indication while the request was going on, that record is
request retried. The discovery process remains in completion/ cleared and the verification request retried. The discovery
verification state. process remains in completion/verification state.
o If there have been no lease-migrated indications, the work of o If there have been no lease-migrated indications, the work of
migration discovery is considered completed and it enters the non- migration discovery is considered completed and it enters the non-
operating state. Once it enters this state, subsequent lease- operating state. Once it enters this state, subsequent lease-
migrated indication will trigger a new migration discovery migrated indication will trigger a new migration discovery
process. process.
It should be noted that the process described above is not guaranteed It should be noted that the process described above is not guaranteed
to terminate, as a long series of new migration events might to terminate, as a long series of new migration events might
continually delay the clearing of the LEASE_MOVED indication. To continually delay the clearing of the LEASE_MOVED indication. To
skipping to change at page 36, line 34 skipping to change at page 37, line 50
During the first phase of this process, the client proceeds to During the first phase of this process, the client proceeds to
examine location entries to find the initial network address it will examine location entries to find the initial network address it will
use to continue access to the file system or its replacement. For use to continue access to the file system or its replacement. For
each location entry that the client examines, the process consists of each location entry that the client examines, the process consists of
five steps: five steps:
1. Performing an EXCHANGE_ID directed at the location address. This 1. Performing an EXCHANGE_ID directed at the location address. This
operation is used to register the client-owner with the server, operation is used to register the client-owner with the server,
to obtain a client ID to be use subsequently to communicate with to obtain a client ID to be use subsequently to communicate with
it, to obtain tat client ID's confirmation status and, to it, to obtain that client ID's confirmation status, and to
determine server_owner and scope for the purpose of determining determine server_owner and scope for the purpose of determining
if the entry is trunkable with that previously being used to if the entry is trunkable with that previously being used to
access the file system (i.e. that it represents another network access the file system (i.e. that it represents another network
access path to the same file system and can share locking state access path to the same file system and can share locking state
with it). with it).
2. Making an initial determination of whether migration has 2. Making an initial determination of whether migration has
occurred. The initial determination will be based on whether the occurred. The initial determination will be based on whether the
EXCHANGE_ID results indicate that the current location element is EXCHANGE_ID results indicate that the current location element is
server-trunkable with that used to access the file system when server-trunkable with that used to access the file system when
skipping to change at page 37, line 38 skipping to change at page 39, line 7
During this later phase of the process, further location entries are During this later phase of the process, further location entries are
examined using the abbreviated procedure specified below: examined using the abbreviated procedure specified below:
1. Before the EXCHANGE_ID, the fs name of the location entry is 1. Before the EXCHANGE_ID, the fs name of the location entry is
examined and if it does not match that currently being used, the examined and if it does not match that currently being used, the
entry is ignored. otherwise, one proceeds as specified by step 1 entry is ignored. otherwise, one proceeds as specified by step 1
above,. above,.
2. In the case that the network address is session-trunkable with 2. In the case that the network address is session-trunkable with
one used previously a BIND_CONN_TO_SESSION is used to access that one used previously a BIND_CONN_TO_SESSION is used to access that
session using new network address. Otherwise, or if the bind session using the new network address. Otherwise, or if the bind
operation fails, a CREATE_SESSION is done. operation fails, a CREATE_SESSION is done.
3. The verification procedure referred to in step 4 above is used. 3. The verification procedure referred to in step 4 above is used.
However, if it fails, the entry is ignored and the next available However, if it fails, the entry is ignored and the next available
entry is used. entry is used.
10.4. Obtaining Access to Sessions and State after Migration (to be 10.4. Obtaining Access to Sessions and State after Migration (to be
added) added)
In the event that migration has occurred, migration recovery will In the event that migration has occurred, migration recovery will
skipping to change at page 38, line 20 skipping to change at page 39, line 37
existing client ID representing the client to the destination existing client ID representing the client to the destination
server. In this state merger case, Transparent State Migration server. In this state merger case, Transparent State Migration
might or might not have occurred and a determination as to whether might or might not have occurred and a determination as to whether
it has occurred is deferred until sessions are established and the it has occurred is deferred until sessions are established and the
client is ready to begin state recovery. client is ready to begin state recovery.
o If the client ID is a confirmed client ID not previously known to o If the client ID is a confirmed client ID not previously known to
the client, then the client can conclude that the client ID was the client, then the client can conclude that the client ID was
transferred as part of Transparent State Migration. In this transferred as part of Transparent State Migration. In this
transferred client ID case, Transparent State Migration has transferred client ID case, Transparent State Migration has
occurred although some state may have been lost. occurred although some state might have been lost.
Once the client ID has been obtained, it is necessary to obtain Once the client ID has been obtained, it is necessary to obtain
access to sessions to continue communication with the new server. In access to sessions to continue communication with the new server. In
any of the cases in which Transparent State Migration has occurred, any of the cases in which Transparent State Migration has occurred,
it is possible that a session was transferred as well. To deal with it is possible that a session was transferred as well. To deal with
that possibility, clients can, after doing the EXCHANGE_ID, issue a that possibility, clients can, after doing the EXCHANGE_ID, issue a
BIND_CONN_TO_SESSION to connect the transferred session to a BIND_CONN_TO_SESSION to connect the transferred session to a
connection to the new server. If that fails, it is an indication connection to the new server. If that fails, it is an indication
that the session was not transferred and that a new session needs to that the session was not transferred and that a new session needs to
be created to take its place. be created to take its place.
skipping to change at page 39, line 29 skipping to change at page 40, line 46
o In a case in which Transparent State Migration has occurred, and o In a case in which Transparent State Migration has occurred, and
no lock state was lost (as shown by SEQ4_STATUS flags), no lock no lock state was lost (as shown by SEQ4_STATUS flags), no lock
reclaim is necessary. reclaim is necessary.
o In a case in which Transparent State Migration has occurred, and o In a case in which Transparent State Migration has occurred, and
some lock state was lost (as shown by SEQ4_STATUS flags), existing some lock state was lost (as shown by SEQ4_STATUS flags), existing
stateids need to be checked for validity using TEST_STATEID, and stateids need to be checked for validity using TEST_STATEID, and
reclaim used to re-establish any that were not transferred. reclaim used to re-establish any that were not transferred.
For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value
of true should be done before normal use of the file system including of TRUE needs to be done before normal use of the file system
obtaining new locks for the file system. This applies even if no including obtaining new locks for the file system. This applies even
locks were lost and there was no need for any to be reclaimed. if no locks were lost and there was no need for any to be reclaimed.
10.5. Obtaining Access to Sessions and State after Network Address 10.5. Obtaining Access to Sessions and State after Network Address
Transfer (to be added) Transfer (to be added)
The case in which there is a transfer to a new network address The case in which there is a transfer to a new network address
without migration is similar to that described in Section 10.4 above without migration is similar to that described in Section 10.4 above
in that there is a need to obtain access to needed sessions and in that there is a need to obtain access to needed sessions and
locking state. However, the details are simpler and will vary locking state. However, the details are simpler and will vary
depending on the type of trunking between the address receiving depending on the type of trunking between the address receiving
NFS4ERR_MOVED and that to which the transfer is to be made NFS4ERR_MOVED and that to which the transfer is to be made
To make a session available for use, a BIND_CONN_TO_SESSION should be To make a session available for use, a BIND_CONN_TO_SESSION should be
used to obtain access to the session previously in use. Only if this used to obtain access to the session previously in use. Only if this
fails, should a CREATE_SESSION be done. While this procedure mirrors fails, should a CREATE_SESSION be done. While this procedure mirrors
that in Section 10.4 above, there is an important difference in that that in Section 10.4 above, there is an important difference in that
preservation of the session is not purely optional but depends on the preservation of the session is not purely optional but depends on the
type of trunking. type of trunking.
Access to appropriate locking state should need no actions beyond Access to appropriate locking state should need no actions beyond
access to the session. However. the SEQ4_STATUS bits should be access to the session. However, the SEQ4_STATUS bits need to be
checked for lost locking state, including the need to reclaim locks checked for lost locking state, including the need to reclaim locks
after a server reboot. after a server reboot.
11. Server Responsibilities Upon Migration (to be added) 11. Server Responsibilities Upon Migration (to be added)
In order to effect Transparent State Migration and possibly session In the event of file system migration, when the client connects to
migration, the source and server need to co-operate to transfer the destination server, it needs to be able to provide the client
certain client-relevant information. The sections below discuss the continued to access the files it had open on the source server.
information to be transferred but do not define the specifics of the There are two ways to provide this:
transfer protocol. This is left as an implementation choice although
o By provision of an fs-specific grace period, allowing the client
the ability to reclaim its locks, in a fashion similar to what
would have been done in the case of recovery from a server
restart. See Section 11.1 for a more complete discussion.
o By implementing Transparent State Migration possibly in connection
with session migration, the server can provide the client
immediate access to the state built up on the source server, on
the destination.
These features are discussed separately in Sections 11.2 and 11.3,
which discuss Transparent State Migration and session migration
respectively.
All the features described above can involve transfer of lock-related
information between source and destination servers. In some cases
this transfer is a necessary part of the implementation while in
other cases it is a helpful implementation aid which servers might or
might not use. The sub-sections below discuss the information which
would transferred but do not define the specifics of the transfer
protocol. This is left as an implementation choice although
standards in this area could be developed at a later time. standards in this area could be developed at a later time.
Transparent State Migration and session migration are discussed 11.1. Server Responsibilities in Effecting State Reclaim after
separately, in Sections 11.1 and 11.2 below respectively. In each Migration (to be added)
case, the discussion addresses the issue of providing the client a
consistent view of the transferred state, even though the transfer
might take an extended time.
11.1. Server Responsibilities in Effecting Transparent State Migration In this case, destination server need have no knowledge of the locks
held on the source server, but relies on the clients to accurately
report (via reclaim operations) the locks previously held, not
allowing new locks to be granted on migrated file system until the
grace period expires.
During this grace period clients have the opportunity to use reclaim
operations to obtain locks for file system objects within the
migrated file system, in the same way that they do when recovering
from server restart, and the servers typically rely on clients to
accurately report their locks, although they have the option of
subjecting these requests to verification. If the clients only
reclaim locks held on the source server, no conflict can arise. Once
the client has reclaimed its locks, it indicates the completion of
lock reclamation by performing a RECLAIM_COMPLETE specifying
rca_one_fs as TRUE.
While it is not necessary for source and destination servers to co-
operate to transfer information about locks, implementations are
well-advised to consider transferring the following useful
information:
o If information about the set of clients that have locking state
for the transferred file system, the destination server will be
able to terminate the grace period once all such clients have
reclaimed their locks, allowing normal locking activity to resume
earlier than it would have otherwise.
o Locking summary information for individual clients (at various
possible levels of detail) can detect some instances in which
clients do not accurately represent the locks held on the source
server.
11.2. Server Responsibilities in Effecting Transparent State Migration
(to be added) (to be added)
The basic responsibility of the source server in effecting The basic responsibility of the source server in effecting
Transparent State Migration is to make available to the destination Transparent State Migration is to make available to the destination
server a description of each piece of locking state associated with server a description of each piece of locking state associated with
the file system being migrated. In addition to client id string and the file system being migrated. In addition to client id string and
verifier, the source server needs to provide, for each stateid: verifier, the source server needs to provide, for each stateid:
o The stateid including the current sequence value. o The stateid including the current sequence value.
skipping to change at page 42, line 9 skipping to change at page 44, line 30
longer exists. longer exists.
o Sequencing of operations is no longer done using owner-based o Sequencing of operations is no longer done using owner-based
operation sequences numbers. Instead, sequencing is session- operation sequences numbers. Instead, sequencing is session-
based based
As a result, when sessions are not transferred, the techniques As a result, when sessions are not transferred, the techniques
discussed in Section 7.2 of [RFC7931] are adequate and will not be discussed in Section 7.2 of [RFC7931] are adequate and will not be
further discussed. further discussed.
11.2. Server Responsibilities in Effecting Session Transfer (to be 11.3. Server Responsibilities in Effecting Session Transfer (to be
added) added)
The basic responsibility of the source server in effecting session The basic responsibility of the source server in effecting session
transfer is to make available to the destination server a description transfer is to make available to the destination server a description
of the current state of each slot with the session, including: of the current state of each slot with the session, including:
o The last sequence value received for that slot. o The last sequence value received for that slot.
o Whether there is cached reply data for the last request executed o Whether there is cached reply data for the last request executed
and, if so, the cached reply. and, if so, the cached reply.
skipping to change at page 44, line 8 skipping to change at page 46, line 29
An important issue is that the specification needs to take note of An important issue is that the specification needs to take note of
all potential COMPOUNDs, even if they might be unlikely in practice. all potential COMPOUNDs, even if they might be unlikely in practice.
For example, a COMPOUND is allowed to access multiple file systems For example, a COMPOUND is allowed to access multiple file systems
and might perform non-idempotent operations in some of them before and might perform non-idempotent operations in some of them before
accessing a file system being migrated. Also, a COMPOUND may return accessing a file system being migrated. Also, a COMPOUND may return
considerable data in the response, before being rejected with considerable data in the response, before being rejected with
NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as
sa_cachethis. sa_cachethis.
To address these issues, the destination server MAY do any of the To address these issues, a destination server MAY do any of the
following. following when implementing session transfer.
o Avoid enforcing any sequencing semantics for a particular slot o Avoid enforcing any sequencing semantics for a particular slot
until the client has established the starting sequence for that until the client has established the starting sequence for that
slot on the destination server. slot on the destination server.
o For each slot, avoid returning a cached reply returning o For each slot, avoid returning a cached reply returning
NFS4ERR_DELAY or NFS4ERR_MOVED until the client has established NFS4ERR_DELAY or NFS4ERR_MOVED until the client has established
the starting sequence for that slot on the destination server. the starting sequence for that slot on the destination server.
o Until the client has established the starting sequence for a o Until the client has established the starting sequence for a
particular slot on the destination server, avoid reporting particular slot on the destination server, avoid reporting
NFS4ERR_SEQ_MISORDERED or return a cached reply returning NFS4ERR_SEQ_MISORDERED or return a cached reply returning
NFS4ERR_DELAY or NFS4ERR_MOVED, where the reply consists solely of NFS4ERR_DELAY or NFS4ERR_MOVED, where the reply consists solely of
a series of operations where the response is NFS4_OK until the a series of operations where the response is NFS4_OK until the
final error. final error.
12. Changes to RFC5661 outside Section 11 12. fs_locations_info
12.1. Updates to treatment of fs_locations_info
Various elements of the fs_locations_info attribute contain
information that applies to either a specific filesystem replica or
to a network path or set of network paths used to access such a
replica. The existing treatment of fs_locations info (in
Section 11.10 of [RFC5661]) does not clearly distinguish these cases,
in part because the document did not clearly distinguish replicas
from the paths used to access them.
In addition, special clarification needed to be provided for:
o With regard to the handling of FSLI4GF_GOING, it needs to be made
clear that this only applies to the unavailability of a replica
rather than to a path to access a replica.
o In describing the appropriate value for a server to use for
fli_valid_for, it needs to be made clear that there is no need for
the client to frequently fetch the fs_locations_info value to be
prepared for shifts in trunking patterns.
o Clarification of the rules for extensions of the fls_info needs to
be provided. The existing treatment reflects the extension model
in effect at the time [RFC5661] was written, and need to be
updated in accord with the extension model described [RFC8178].
12.2. The Attribute fs_locations_info (as updated)
The fs_locations_info attribute is intended as a more functional
replacement for the fs_locations attribute which will continue to
exist and be supported. Clients can use it to get a more complete
set of data about alternative file system locations, including
additional network paths to access replicas in use and additional
replicas. When the server does not support fs_locations_info,
fs_locations can be used to get a subset of the data. A server that
supports fs_locations_info MUST support fs_locations as well.
There is additional data present in fs_locations_info, that is not
available in fs_locations:
o Attribute continuity information. This information will allow a
client to select a replica that meets the transparency
requirements of the applications accessing the data and to
leverage optimizations due to the server guarantees of attribute
continuity (e.g., if the change attribute of a file of the file
system is continuous between multiple replicas, the client does
not have to invalidate the file's cache when switching to a
different replica).
o File system identity information that indicates when multiple
replicas, from the client's point of view, correspond to the same
target file system, allowing them to be used interchangeably,
without disruption, as distinct synchronized replicas of the same
file data.
Note that having two replicas with common identity information is
distinct from the case of two (trunked) paths to the same replica.
o Information that will bear on the suitability of various replicas,
depending on the use that the client intends. For example, many
applications need an absolutely up-to-date copy (e.g., those that
write), while others may only need access to the most up-to-date
copy reasonably available.
o Server-derived preference information for replicas, which can be
used to implement load-balancing while giving the client the
entire file system list to be used in case the primary fails.
The fs_locations_info attribute is structured similarly to the
fs_locations attribute. A top-level structure (fs_locations_info4)
contains the entire attribute including the root pathname of the file
system and an array of lower-level structures that define replicas
that share a common rootpath on their respective servers. The lower-
level structure in turn (fs_locations_item4) contains a specific
pathname and information on one or more individual network access
paths. For that last lowest level, fs_locations_info has an
fs_locations_server4 structure that contains per-server-replica
information in addition to the location entry. This per-server-
replica information includes a nominally opaque array, fls_info,
within which specific pieces of information are located at the
specific indices listed below.
Two fs_location_server4 entries that are within different
fs_location_item4 structures are never trunkable, while two entries
within in the same fs_location_item4 structure might or might not be
trunkable. Two entries that are trunkable will have identical
identity information, although, as noted above, the converse is not
the case.
The attribute will always contain at least a single
fs_locations_server entry. Typically, there will be an entries with
the FS4LIGF_CUR_REQ flag set, although in the case of a referral
there will be no entry with that flag set.
It should be noted that fs_locations_info attributes returned by
servers for various replicas may differ for various reasons. One
server may know about a set of replicas that are not known to other
servers. Further, compatibility attributes may differ. Filehandles
might be of the same class going from replica A to replica B but not
going in the reverse direction. This might happen because the
filehandles are the same, but replica B's server implementation might
not have provision to note and report that equivalence.
The fs_locations_info attribute consists of a root pathname
(fli_fs_root, just like fs_root in the fs_locations attribute),
together with an array of fs_location_item4 structures. The
fs_location_item4 structures in turn consist of a root pathname
(fli_rootpath) together with an array (fli_entries) of elements of
data type fs_locations_server4, all defined as follows.
<CODE BEGINS>
/*
* Defines an individual server access path
*/
struct fs_locations_server4 {
int32_t fls_currency;
opaque fls_info<>;
utf8str_cis fls_server;
};
/*
* Byte indices of items within
* fls_info: flag fields, class numbers,
* bytes indicating ranks and orders.
*/
const FSLI4BX_GFLAGS = 0;
const FSLI4BX_TFLAGS = 1;
const FSLI4BX_CLSIMUL = 2;
const FSLI4BX_CLHANDLE = 3;
const FSLI4BX_CLFILEID = 4;
const FSLI4BX_CLWRITEVER = 5;
const FSLI4BX_CLCHANGE = 6;
const FSLI4BX_CLREADDIR = 7;
const FSLI4BX_READRANK = 8;
const FSLI4BX_WRITERANK = 9;
const FSLI4BX_READORDER = 10;
const FSLI4BX_WRITEORDER = 11;
/*
* Bits defined within the general flag byte.
*/
const FSLI4GF_WRITABLE = 0x01;
const FSLI4GF_CUR_REQ = 0x02;
const FSLI4GF_ABSENT = 0x04;
const FSLI4GF_GOING = 0x08;
const FSLI4GF_SPLIT = 0x10;
/*
* Bits defined within the transport flag byte.
*/
const FSLI4TF_RDMA = 0x01;
/*
* Defines a set of replicas sharing
* a common value of the rootpath
* within the corresponding
* single-server namespaces.
*/
struct fs_locations_item4 {
fs_locations_server4 fli_entries<>;
pathname4 fli_rootpath;
};
/*
* Defines the overall structure of
* the fs_locations_info attribute.
*/
struct fs_locations_info4 {
uint32_t fli_flags;
int32_t fli_valid_for;
pathname4 fli_fs_root;
fs_locations_item4 fli_items<>;
};
/*
* Flag bits in fli_flags.
*/
const FSLI4IF_VAR_SUB = 0x00000001;
typedef fs_locations_info4 fattr4_fs_locations_info;
<CODE ENDS>
As noted above, the fs_locations_info attribute, when supported, may
be requested of absent file systems without causing NFS4ERR_MOVED to
be returned. It is generally expected that it will be available for
both present and absent file systems even if only a single
fs_locations_server4 entry is present, designating the current
(present) file system, or two fs_locations_server4 entries
designating the previous location of an absent file system (the one
just referenced) and its successor location. Servers are strongly
urged to support this attribute on all file systems if they support
it on any file system.
The data presented in the fs_locations_info attribute may be obtained
by the server in any number of ways, including specification by the
administrator or by current protocols for transferring data among
replicas and protocols not yet developed. NFSv4.1 only defines how
this information is presented by the server to the client.
12.2.1. The fs_locations_server4 Structure (as updated)
The fs_locations_server4 structure consists of the following items in
addition to the fls_server field which specifies a network address or
set of addresses to be used to access the specified file system.
Note that both of these items specify attributes of the file system
replica and should not be different when there are multiple
fs_locations_server4 structures for the same replica, each specifying
a network path to the chosen replica.
o An indication of how up-to-date the file system is (fls_currency)
in seconds. This value is relative to the master copy. A
negative value indicates that the server is unable to give any
reasonably useful value here. A value of zero indicates that the
file system is the actual writable data or a reliably coherent and
fully up-to-date copy. Positive values indicate how out-of-date
this copy can normally be before it is considered for update.
Such a value is not a guarantee that such updates will always be
performed on the required schedule but instead serves as a hint
about how far the copy of the data would be expected to be behind
the most up-to-date copy.
o A counted array of one-byte values (fls_info) containing
information about the particular file system instance. This data
includes general flags, transport capability flags, file system
equivalence class information, and selection priority information.
The encoding will be discussed below.
o The server string (fls_server). For the case of the replica
currently being accessed (via GETATTR), a zero-length string MAY
be used to indicate the current address being used for the RPC
call. The fls_server field can also be an IPv4 or IPv6 address,
formatted the same way as an IPv4 or IPv6 address in the "server"
field of the fs_location4 data type (see Section 11.9 of
[RFC5661]).
With the exception of the transport-flag field (at offset
FSLIBX_TFLAGS with the fls_info array), all of this data applies to
the replica specified by the entry, rather that the specific network
path used to access it.
Data within the fls_info array is in the form of 8-bit data items
with constants giving the offsets within the array of various values
describing this particular file system instance. This style of
definition was chosen, in preference to explicit XDR structure
definitions for these values, for a number of reasons.
o The kinds of data in the fls_info array, representing flags, file
system classes, and priorities among sets of file systems
representing the same data, are such that 8 bits provide a quite
acceptable range of values. Even where there might be more than
256 such file system instances, having more than 256 distinct
classes or priorities is unlikely.
o Explicit definition of the various specific data items within XDR
would limit expandability in that any extension within would
require yet another attribute, leading to specification and
implementation clumsiness. In the context of the NFSv4 extension
model in effect at the time fs_locations_info was designed (i.e.
that described in [RFC5661]), this would necessitate a new minor
to effect any Standards Track extension to the data in in
fls_info.
The set of fls_info data is subject to expansion in a future minor
version, or in a Standards Track RFC, within the context of a single
minor version. The server SHOULD NOT send and the client MUST NOT
use indices within the fls_info array or flag bits that are not
defined in Standards Track RFCs.
In light of the new extension model defined in [RFC8178] and the fact
that the individual items within fls_info are not explicitly
referenced in the XDR, the following practices should be followed
when extending or otherwise changing the structure of the data
returned in fls_info within the scope of a single minor version.
o All extensions need to be described by Standards Track documents.
There is no need for such documents to be marked as updating
[RFC5661] or this document.
o It needs to be made clear whether the information in any added
data items applies to the replica specified by the entry or to the
specific network paths specified in the entry.
o There needs to be a reliable way defined to determine whether the
server is aware of the extension. This may be based on the length
field of the fls_info array, but it is more flexible to provide
fs-scope or server-scope attributes to indicate what extensions
are provided.
This encoding scheme can be adapted to the specification of multi-
byte numeric values, even though none are currently defined. If
extensions are made via Standards Track RFCs, multi-byte quantities
will be encoded as a range of bytes with a range of indices, with the
byte interpreted in big-endian byte order. Further, any such index
assignments will be constrained by the need for the relevant
quantities not to cross XDR word boundaries.
The fls_info array currently contains:
o Two 8-bit flag fields, one devoted to general file-system
characteristics and a second reserved for transport-related
capabilities.
o Six 8-bit class values that define various file system equivalence
classes as explained below.
o Four 8-bit priority values that govern file system selection as
explained below.
The general file system characteristics flag (at byte index
FSLI4BX_GFLAGS) has the following bits defined within it:
o FSLI4GF_WRITABLE indicates that this file system target is
writable, allowing it to be selected by clients that may need to
write on this file system. When the current file system instance
is writable and is defined as of the same simultaneous use class
(as specified by the value at index FSLI4BX_CLSIMUL) to which the
client was previously writing, then it must incorporate within its
data any committed write made on the source file system instance.
See Section 8.6, which discusses the write-verifier class. While
there is no harm in not setting this flag for a file system that
turns out to be writable, turning the flag on for a read-only file
system can cause problems for clients that select a migration or
replication target based on the flag and then find themselves
unable to write.
o FSLI4GF_CUR_REQ indicates that this replica is the one on which
the request is being made. Only a single server entry may have
this flag set and, in the case of a referral, no entry will have
it set. Note that this flag might be set even if the request was
made on a network access path different from any of those
specified in the current entry.
o FSLI4GF_ABSENT indicates that this entry corresponds to an absent
file system replica. It can only be set if FSLI4GF_CUR_REQ is
set. When both such bits are set, it indicates that a file system
instance is not usable but that the information in the entry can
be used to determine the sorts of continuity available when
switching from this replica to other possible replicas. Since
this bit can only be true if FSLI4GF_CUR_REQ is true, the value
could be determined using the fs_status attribute, but the
information is also made available here for the convenience of the
client. An entry with this bit, since it represents a true file
system (albeit absent), does not appear in the event of a
referral, but only when a file system has been accessed at this
location and has subsequently been migrated.
o FSLI4GF_GOING indicates that a replica, while still available,
should not be used further. The client, if using it, should make
an orderly transfer to another file system instance as
expeditiously as possible. It is expected that file systems going
out of service will be announced as FSLI4GF_GOING some time before
the actual loss of service. It is also expected that the
fli_valid_for value will be sufficiently small to allow clients to
detect and act on scheduled events, while large enough that the
cost of the requests to fetch the fs_locations_info values will
not be excessive. Values on the order of ten minutes seem
reasonable.
When this flag is seen as part of a transition into a new file
system, a client might choose to transfer immediately to another
replica, or it may reference the current file system and only
transition when a migration event occurs. Similarly, when this
flag appears as a replica in the referral, clients would likely
avoid being referred to this instance whenever there is another
choice.
This flag, like the other items within fls_info applies to the
replica, rather than to a particular path to that replica. When
it appears, a transition to a new replica rather than to a
different path to the same replica, is indicated.
o FSLI4GF_SPLIT indicates that when a transition occurs from the
current file system instance to this one, the replacement may
consist of multiple file systems. In this case, the client has to
be prepared for the possibility that objects on the same file
system before migration will be on different ones after. Note
that FSLI4GF_SPLIT is not incompatible with the file systems
belonging to the same fileid class since, if one has a set of
fileids that are unique within a file system, each subset assigned
to a smaller file system after migration would not have any
conflicts internal to that file system.
A client, in the case of a split file system, will interrogate
existing files with which it has continuing connection (it is free
to simply forget cached filehandles). If the client remembers the
directory filehandle associated with each open file, it may
proceed upward using LOOKUPP to find the new file system
boundaries. Note that in the event of a referral, there will not
be any such files and so these actions will not be performed.
Instead, a reference to a portion of the original file system now
split off into other file systems will encounter an fsid change
and possibly a further referral.
Once the client recognizes that one file system has been split
into two, it can prevent the disruption of running applications by
presenting the two file systems as a single one until a convenient
point to recognize the transition, such as a restart. This would
require a mapping from the server's fsids to fsids as seen by the
client, but this is already necessary for other reasons. As noted
above, existing fileids within the two descendant file systems
will not conflict. Providing non-conflicting fileids for newly
created files on the split file systems is the responsibility of
the server (or servers working in concert). The server can encode
filehandles such that filehandles generated before the split event
can be discerned from those generated after the split, allowing
the server to determine when the need for emulating two file
systems as one is over.
Although it is possible for this flag to be present in the event
of referral, it would generally be of little interest to the
client, since the client is not expected to have information
regarding the current contents of the absent file system.
The transport-flag field (at byte index FSLI4BX_TFLAGS) contains the
following bits related to the transport capabilities of the specific
network path(s) specified by the entry.
o FSLI4TF_RDMA indicates that any specified network paths provide
NFSv4.1 clients access using an RDMA-capable transport.
Attribute continuity and file system identity information are
expressed by defining equivalence relations on the sets of file
systems presented to the client. Each such relation is expressed as
a set of file system equivalence classes. For each relation, a file
system has an 8-bit class number. Two file systems belong to the
same class if both have identical non-zero class numbers. Zero is
treated as non-matching. Most often, the relevant question for the
client will be whether a given replica is identical to / continuous
with the current one in a given respect, but the information should
be available also as to whether two other replicas match in that
respect as well.
The following fields specify the file system's class numbers for the
equivalence relations used in determining the nature of file system
transitions. See Sections 6 through 11 and their various subsections
for details about how this information is to be used. Servers may
assign these values as they wish, so long as file system instances
that share the same value have the specified relationship to one
another; conversely, file systems that have the specified
relationship to one another share a common class value. As each
instance entry is added, the relationships of this instance to
previously entered instances can be consulted, and if one is found
that bears the specified relationship, that entry's class value can
be copied to the new entry. When no such previous entry exists, a
new value for that byte index (not previously used) can be selected,
most likely by incrementing the value of the last class value
assigned for that index.
o The field with byte index FSLI4BX_CLSIMUL defines the
simultaneous-use class for the file system.
o The field with byte index FSLI4BX_CLHANDLE defines the handle
class for the file system.
o The field with byte index FSLI4BX_CLFILEID defines the fileid
class for the file system.
o The field with byte index FSLI4BX_CLWRITEVER defines the write-
verifier class for the file system.
o The field with byte index FSLI4BX_CLCHANGE defines the change
class for the file system.
o The field with byte index FSLI4BX_CLREADDIR defines the readdir
class for the file system.
Server-specified preference information is also provided via 8-bit
values within the fls_info array. The values provide a rank and an
order (see below) to be used with separate values specifiable for the
cases of read-only and writable file systems. These values are
compared for different file systems to establish the server-specified
preference, with lower values indicating "more preferred".
Rank is used to express a strict server-imposed ordering on clients,
with lower values indicating "more preferred". Clients should
attempt to use all replicas with a given rank before they use one
with a higher rank. Only if all of those file systems are
unavailable should the client proceed to those of a higher rank.
Because specifying a rank will override client preferences, servers
should be conservative about using this mechanism, particularly when
the environment is one in which client communication characteristics
are neither tightly controlled nor visible to the server.
Within a rank, the order value is used to specify the server's
preference to guide the client's selection when the client's own
preferences are not controlling, with lower values of order
indicating "more preferred". If replicas are approximately equal in
all respects, clients should defer to the order specified by the
server. When clients look at server latency as part of their
selection, they are free to use this criterion but it is suggested
that when latency differences are not significant, the server-
specified order should guide selection.
o The field at byte index FSLI4BX_READRANK gives the rank value to
be used for read-only access.
o The field at byte index FSLI4BX_READORDER gives the order value to
be used for read-only access.
o The field at byte index FSLI4BX_WRITERANK gives the rank value to
be used for writable access.
o The field at byte index FSLI4BX_WRITEORDER gives the order value
to be used for writable access.
Depending on the potential need for write access by a given client,
one of the pairs of rank and order values is used. The read rank and
order should only be used if the client knows that only reading will
ever be done or if it is prepared to switch to a different replica in
the event that any write access capability is required in the future.
12.2.2. The fs_locations_info4 Structure (as updated)
The fs_locations_info4 structure, encoding the fs_locations_info
attribute, contains the following:
o The fli_flags field, which contains general flags that affect the
interpretation of this fs_locations_info4 structure and all
fs_locations_item4 structures within it. The only flag currently
defined is FSLI4IF_VAR_SUB. All bits in the fli_flags field that
are not defined should always be returned as zero.
o The fli_fs_root field, which contains the pathname of the root of
the current file system on the current server, just as it does in
the fs_locations4 structure.
o An array called fli_items of fs_locations4_item structures, which
contain information about replicas of the current file system.
Where the current file system is actually present, or has been
present, i.e., this is not a referral situation, one of the
fs_locations_item4 structures will contain an fs_locations_server4
for the current server. This structure will have FSLI4GF_ABSENT
set if the current file system is absent, i.e., normal access to
it will return NFS4ERR_MOVED.
o The fli_valid_for field specifies a time in seconds for which it
is reasonable for a client to use the fs_locations_info attribute
without refetch. The fli_valid_for value does not provide a
guarantee of validity since servers can unexpectedly go out of
service or become inaccessible for any number of reasons. Clients
are well-advised to refetch this information for an actively
accessed file system at every fli_valid_for seconds. This is
particularly important when file system replicas may go out of
service in a controlled way using the FSLI4GF_GOING flag to
communicate an ongoing change. The server should set
fli_valid_for to a value that allows well-behaved clients to
notice the FSLI4GF_GOING flag and make an orderly switch before
the loss of service becomes effective. If this value is zero,
then no refetch interval is appropriate and the client need not
refetch this data on any particular schedule. In the event of a
transition to a new file system instance, a new value of the
fs_locations_info attribute will be fetched at the destination.
It is to be expected that this may have a different fli_valid_for
value, which the client should then use in the same fashion as the
previous value. Because a refetch of the attribute cause
information from all component entries to be refetched, the server
will typically provide a low value for this field if any of the
replicas are likely to go out of service in a short time frame.
Note that, because of the ability of the server to return
NFS4ERR_MOVED to change to use of different paths, when alternate
trunked paths are available, there is generally no need to use low
values of fli_valid_for in connection with the management of
alternate paths to the same replica.
The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable
substitution is to be enabled. See Section 12.2.3 for an explanation
of variable substitution.
12.2.3. The fs_locations_item4 Structure (as updated)
The fs_locations_item4 structure contains a pathname (in the field
fli_rootpath) that encodes the path of the target file system
replicas on the set of servers designated by the included
fs_locations_server4 entries. The precise manner in which this
target location is specified depends on the value of the
FSLI4IF_VAR_SUB flag within the associated fs_locations_info4
structure.
If this flag is not set, then fli_rootpath simply designates the
location of the target file system within each server's single-server
namespace just as it does for the rootpath within the fs_location4
structure. When this bit is set, however, component entries of a
certain form are subject to client-specific variable substitution so
as to allow a degree of namespace non-uniformity in order to
accommodate the selection of client-specific file system targets to
adapt to different client architectures or other characteristics.
When such substitution is in effect, a variable beginning with the
string "${" and ending with the string "}" and containing a colon is
to be replaced by the client-specific value associated with that
variable. The string "unknown" should be used by the client when it
has no value for such a variable. The pathname resulting from such
substitutions is used to designate the target file system, so that
different clients may have different file systems, corresponding to
that location in the multi-server namespace.
As mentioned above, such substituted pathname variables contain a
colon. The part before the colon is to be a DNS domain name, and the
part after is to be a case-insensitive alphanumeric string.
Where the domain is "ietf.org", only variable names defined in this
document or subsequent Standards Track RFCs are subject to such
substitution. Organizations are free to use their domain names to
create their own sets of client-specific variables, to be subject to
such substitution. In cases where such variables are intended to be
used more broadly than a single organization, publication of an
Informational RFC defining such variables is RECOMMENDED.
The variable ${ietf.org:CPU_ARCH} is used to denote that the CPU
architecture object files are compiled. This specification does not
limit the acceptable values (except that they must be valid UTF-8
strings), but such values as "x86", "x86_64", and "sparc" would be
expected to be used in line with industry practice.
The variable ${ietf.org:OS_TYPE} is used to denote the operating
system, and thus the kernel and library APIs, for which code might be
compiled. This specification does not limit the acceptable values
(except that they must be valid UTF-8 strings), but such values as
"linux" and "freebsd" would be expected to be used in line with
industry practice.
The variable ${ietf.org:OS_VERSION} is used to denote the operating
system version, and thus the specific details of versioned
interfaces, for which code might be compiled. This specification
does not limit the acceptable values (except that they must be valid
UTF-8 strings). However, combinations of numbers and letters with
interspersed dots would be expected to be used in line with industry
practice, with the details of the version format depending on the
specific value of the variable ${ietf.org:OS_TYPE} with which it is
used.
Use of these variables could result in the direction of different
clients to different file systems on the same server, as appropriate
to particular clients. In cases in which the target file systems are
located on different servers, a single server could serve as a
referral point so that each valid combination of variable values
would designate a referral hosted on a single server, with the
targets of those referrals on a number of different servers.
Because namespace administration is affected by the values selected
to substitute for various variables, clients should provide
convenient means of determining what variable substitutions a client
will implement, as well as, where appropriate, providing means to
control the substitutions to be used. The exact means by which this
will be done is outside the scope of this specification.
Although variable substitution is most suitable for use in the
context of referrals, it may be used in the context of replication
and migration. If it is used in these contexts, the server must
ensure that no matter what values the client presents for the
substituted variables, the result is always a valid successor file
system instance to that from which a transition is occurring, i.e.,
that the data is identical or represents a later image of a writable
file system.
Note that when fli_rootpath is a null pathname (that is, one with
zero components), the file system designated is at the root of the
specified server, whether or not the FSLI4IF_VAR_SUB flag within the
associated fs_locations_info4 structure is set.
13. Changes to RFC5661 outside Section 11
Beside the major rework of Section 11, there are a number of related Beside the major rework of Section 11, there are a number of related
changes that are necessary: changes that are necessary:
o The summary that appeared in Section 1.7.3.3 of [RFC5661] needs to o The summary that appeared in Section 1.7.3.3 of [RFC5661] needs to
be revised to reflect the changes called for in Section 4 of the be revised to reflect the changes called for in Section 4 of the
current document. The updated summary appears as Section 12.1 current document. The updated summary appears as Section 13.1
below. below.
o The discussion of server scope which appeared in Section 2.10.4 of o The discussion of server scope which appeared in Section 2.10.4 of
[RFC5661] needs to be replaced, since the existing text appears to [RFC5661] needs to be replaced, since the existing text appears to
require a level of inter-server co-ordination incompatible with require a level of inter-server co-ordination incompatible with
its basic function of avoiding the need for a globally uniform its basic function of avoiding the need for a globally uniform
means of assigning server_owner values. A revised treatment means of assigning server_owner values. A revised treatment
appears Section 12.2 below. appears in Section 13.2 below.
o While the last paragraph (exclusive of sub-sections) of o While the last paragraph (exclusive of sub-sections) of
Section 2.10.5 in [RFC5661], dealing with server_owner changes, is Section 2.10.5 in [RFC5661], dealing with server_owner changes, is
literally true, it has been a source of confusion. Since the literally true, it has been a source of confusion. Since the
existing paragraph can be read as suggesting that such changes be existing paragraph can be read as suggesting that such changes be
dealt with non-disruptively, the treatment in Section 12.4 below dealt with non-disruptively, the treatment in Section 13.4 below
needs to be substituted. needs to be substituted.
o The existing definition of NFS4ERR_MOVED (in Section 15.1.2.4 of o The existing definition of NFS4ERR_MOVED (in Section 15.1.2.4 of
[RFC5661]) needs to be updated to reflect the different handling [RFC5661]) needs to be updated to reflect the different handling
of unavailability of a particular fs via a specific network of unavailability of a particular fs via a specific network
address. Since such a situation is no longer considered to address. Since such a situation is no longer considered to
constitute unavailability of a file system instance, the constitute unavailability of a file system instance, the
description needs to change even though the instances in which it description needs to change even though the set of circumstances
is returned remain the same. The updated description appears in in which it is to be returned remain the same. The updated
Section 12.3 below. description appears in Section 13.3 below.
o The existing treatment of EXCHANGE_ID (in Section 18.35 of o The existing treatment of EXCHANGE_ID (in Section 18.35 of
[RFC5661]) assumes that client IDs cannot be created/ confirmed [RFC5661]) assumes that client IDs cannot be created/ confirmed
other than by the EXCHANGE_ID and CREATE_SESSION operations. other than by the EXCHANGE_ID and CREATE_SESSION operations.
Also, the necessary use of EXCHANGE_ID in recovery from migration Also, the necessary use of EXCHANGE_ID in recovery from migration
and related situations is not addressed clearly. A revised and related situations is not addressed clearly. A revised
treatment of EXCHANGE_ID is necessary and it appears in Section 13 treatment of EXCHANGE_ID is necessary and it appears in Section 14
below while the specific differences between it and the treatment below while the specific differences between it and the treatment
within [RFC5661] are explained in Section 12.5 below. within [RFC5661] are explained in Section 13.5 below.
12.1. (Introduction to) Multi-Server Namespace (as updated) o The existing treatment of RECLAIM_COMPLETE in section 18.51 of
[RFC5661]) is not sufficiently clear about the purpose and use of
the rca_one_fs and how the server is to deal with inappropriate
values of this argument. Because the resulting confusion raises
interoperability issues, a new treatment of RECLAIM_COMPLETE is
necessary and it appears in Section 15 below while the specific
differences between it and the treatment within [RFC5661] are
discussed in Section 13.6 below. In addition, the definitions of
the reclaim-related errors receive an updated treatment in
Section 13.7 to reflect the fact that there are multiple contexts
for lock reclaim operations.
13.1. (Introduction to) Multi-Server Namespace (as updated)
NFSv4.1 contains a number of features to allow implementation of NFSv4.1 contains a number of features to allow implementation of
namespaces that cross server boundaries and that allow and facilitate namespaces that cross server boundaries and that allow and facilitate
a non-disruptive transfer of support for individual file systems a non-disruptive transfer of support for individual file systems
between servers. They are all based upon attributes that allow one between servers. They are all based upon attributes that allow one
file system to specify alternate, additional, and new location file system to specify alternate, additional, and new location
information which specifies how the client may access to access that information which specifies how the client may access to access that
file system. file system.
These attributes can be used to provide for individual active file These attributes can be used to provide for individual active file
skipping to change at page 45, line 42 skipping to change at page 62, line 32
o Alternate network addresses to access the current file system o Alternate network addresses to access the current file system
instance. instance.
o The locations of alternate file system instances or replicas to be o The locations of alternate file system instances or replicas to be
used in the event that the current file system instance becomes used in the event that the current file system instance becomes
unavailable. unavailable.
These attributes may be used together with the concept of absent file These attributes may be used together with the concept of absent file
systems, in which a position in the server namespace is associated systems, in which a position in the server namespace is associated
with locations on other servers without any file system instance on with locations on other servers without there being any corresponding
the current server. file system instance on the current server.
o Location attributes may be used with absent file systems to o Location attributes may be used with absent file systems to
implement referrals whereby one server may direct the client to a implement referrals whereby one server may direct the client to a
file system provided by another server. This allows extensive file system provided by another server. This allows extensive
multi-server namespaces to be constructed. multi-server namespaces to be constructed.
o Location attributes may be provided when a previously present file o Location attributes may be provided when a previously present file
system becomes absent. This allows non-disruptive migration of system becomes absent. This allows non-disruptive migration of
file systems to alternate servers. file systems to alternate servers.
12.2. Server Scope (as updated) 13.2. Server Scope (as updated)
Servers each specify a server scope value in the form of an opaque Servers each specify a server scope value in the form of an opaque
string eir_server_scope returned as part of the results of an string eir_server_scope returned as part of the results of an
EXCHANGE_ID operation. The purpose of the server scope is to allow a EXCHANGE_ID operation. The purpose of the server scope is to allow a
group of servers to indicate to clients that a set of servers sharing group of servers to indicate to clients that a set of servers sharing
the same server scope value has arranged to use compatible values of the same server scope value has arranged to use compatible values of
otherwise opaque identifiers. Thus, the identifiers generated by two otherwise opaque identifiers. Thus, the identifiers generated by two
servers within that set can be assumed compatible so that, in some servers within that set can be assumed compatible so that, in some
cases, identifiers by one server in that set that set may be cases, identifiers by one server in that set that set may be
presented to another server of the same scope. presented to another server of the same scope.
skipping to change at page 47, line 39 skipping to change at page 64, line 27
first server for the fs_locations or fs_locations_info attribute first server for the fs_locations or fs_locations_info attribute
with RPCSEC_GSS authentication. It may need to do this in advance with RPCSEC_GSS authentication. It may need to do this in advance
of the need to verify the common server scope. If the client of the need to verify the common server scope. If the client
successfully authenticates the reply to GETATTR, and the GETATTR successfully authenticates the reply to GETATTR, and the GETATTR
request and reply containing the fs_locations or fs_locations_info request and reply containing the fs_locations or fs_locations_info
attribute refers to the second server, then the equality of server attribute refers to the second server, then the equality of server
scope is supported. A client may choose to limit the use of this scope is supported. A client may choose to limit the use of this
form of support to information relevant to the specific file form of support to information relevant to the specific file
system involved (e.g. a file system being migrated). system involved (e.g. a file system being migrated).
12.3. Revised Treatment of NFS4ERR_MOVED 13.3. Revised Treatment of NFS4ERR_MOVED
Because the term "replica" is now used differently, the current Because of the need to appropriately address trunking-related issues,
description of NFS4ERR_MOVED needs to be changed to the one below. some uses of the term "replica" in [RFC5661] have become problematic
The new paragraph explicitly recognizes that a different network since a shift in network access paths was considered to be a shift to
address might be used, while the previous description, misleadingly, a different replica. As a result, the description of NFS4ERR_MOVED
treated this as a shift between two replicas while only a single file in [RFC5661] needs to be changed to the one below. The new paragraph
system instance might be involved. explicitly recognizes that a different network address might be used,
while the previous description, misleadingly, treated this as a shift
between two replicas while only a single file system instance might
be involved.
The file system that contains the current filehandle object is not The file system that contains the current filehandle object is not
accessible using the address on which the request was made. It accessible using the address on which the request was made. It
still might be accessible using other addresses server-trunkable still might be accessible using other addresses server-trunkable
with it or it might not be present at the server. In the latter with it or it might not be present at the server. In the latter
case, it might have been relocated or migrated to another server, case, it might have been relocated or migrated to another server,
or it might have never been present. The client may obtain or it might have never been present. The client may obtain
information regarding access to the file system location by information regarding access to the file system location by
obtaining the "fs_locations" or "fs_locations_info" attribute for obtaining the "fs_locations" or "fs_locations_info" attribute for
the current filehandle. For further discussion, refer to the current filehandle. For further discussion, refer to
Section 11 of [RFC5661], as modified by the current document. Section 11 of [RFC5661], as modified by the current document.
12.4. Revised Discussion of Server_owner changes 13.4. Revised Discussion of Server_owner changes
Because of problems with the treatment of such changes, the confusing Because of likely problems with the treatment of such changes, a
paragraph, which simply says that such changes need to be dealt with, confusing paragraph which appear at the end of Section 2.5.10 if
is to be replaced by the one below. [RFC5661], which simply says that such changes need to be dealt with,
is to be replaced by the material below.
It is always possible that, as a result of various sorts of It is always possible that, as a result of various sorts of
reconfiguration events, eir_server_scope and eir_server_owner reconfiguration events, eir_server_scope and eir_server_owner
values may be different on subsequent EXCHANGE_ID requests made to values may be different on subsequent EXCHANGE_ID requests made to
the same network address. the same network address.
In most cases such reconfiguration events will be disruptive and In most cases such reconfiguration events will be disruptive and
indicate that an IP address formerly connected to one server is indicate that an IP address formerly connected to one server is
now connected to an entirely different one. now connected to an entirely different one.
skipping to change at page 49, line 5 skipping to change at page 65, line 43
eir_server_owner.so_major_id changes, the client can use eir_server_owner.so_major_id changes, the client can use
filehandles it has and attempt reclaims. It may find that filehandles it has and attempt reclaims. It may find that
these are now stale but if NFS4ERR_STALE is not received, he these are now stale but if NFS4ERR_STALE is not received, he
can proceed to reclaim his opens. can proceed to reclaim his opens.
* When eir_server_scope and eir_server_owner.so_major_id remain * When eir_server_scope and eir_server_owner.so_major_id remain
the same, the client has to use the now-current values of the same, the client has to use the now-current values of
eir_server_owner.so_minor_id in deciding on appropriate forms eir_server_owner.so_minor_id in deciding on appropriate forms
of trunking. of trunking.
12.5. Revision to Treatment of EXCHANGE_ID 13.5. Revision to Treatment of EXCHANGE_ID
There are a number of issues in the original treatment of EXCHANGE_ID There are a number of issues in the original treatment of EXCHANGE_ID
(in [RFC5661]) that cause problems for Transparent State Migration (in [RFC5661]) that cause problems for Transparent State Migration
and for the transfer of access between different network access paths and for the transfer of access between different network access paths
to the same file system instance. to the same file system instance.
These issues arise from the fact that this treatment was written: These issues arise from the fact that this treatment was written:
o assuming that a client ID can only become known to a server by o Assuming that a client ID can only become known to a server by
having been created by executing an EXCHANGE_ID, with confirmation having been created by executing an EXCHANGE_ID, with confirmation
of the ID only possible by execution of a CREATE_SESSION. of the ID only possible by execution of a CREATE_SESSION.
o Considering the interactions between a client and a server only on o Considering the interactions between a client and a server only on
a single network address a single network address
As these assumptions have become invalid in the context of As these assumptions have become invalid in the context of
Transparent State Migration and active use of trunking, the treatment Transparent State Migration and active use of trunking, the treatment
has been modified in several respects. has been modified in several respects.
o It had been assumed that an EXCHANGED_ID executed when the server o It had been assumed that an EXCHANGED_ID executed when the server
is already aware of a given client instance must be either is already aware of a given client instance must be either
updating associated parameters (e.g. with respect to callbacks) or updating associated parameters (e.g. with respect to callbacks) or
a lingering retransmission to deal with a previously lost reply. a lingering retransmission to deal with a previously lost reply.
As result, any slot sequence returned would be of no use. The As result, any slot sequence returned by that operation would be
existing treatment goes so far as to say that it "MUST NOT" be of no use. The existing treatment goes so far as to say that it
used, although this usage is not in accord with [RFC2119]. This "MUST NOT" be used, although this usage is not in accord with
created a difficulty when an EXCHANGE_ID is done after Transparent [RFC2119]. This created a difficulty when an EXCHANGE_ID is done
State Migration since that slot sequence needs to be used in a after Transparent State Migration since that slot sequence would
subsequent CREATE_SESSION. need to be used in a subsequent CREATE_SESSION.
In the updated treatment, CREATE_SESSION is a way that client IDs In the updated treatment, CREATE_SESSION is a way that client IDs
are confirmed but it is understood that other ways are possible. are confirmed but it is understood that other ways are possible.
The slot sequence can be used as needed and cases in which it The slot sequence can be used as needed and cases in which it
would be of no use are appropriately noted. would be of no use are appropriately noted.
o It was assumed that the only functions of EXCHANGE_ID were to o It was assumed that the only functions of EXCHANGE_ID were to
inform the server of the client, create the client ID, and inform the server of the client, create the client ID, and
communicate it to the client. When multiple simultaneous communicate it to the client. When multiple simultaneous
connections are involved, as often happens when trunking, that connections are involved, as often happens when trunking, that
skipping to change at page 50, line 7 skipping to change at page 66, line 47
EXCHANGE_ID in associating the client ID with the connection on EXCHANGE_ID in associating the client ID with the connection on
which it was done, so that it could be used by a subsequent which it was done, so that it could be used by a subsequent
CREATE_SESSSION, whose parameters do not include an explicit CREATE_SESSSION, whose parameters do not include an explicit
client ID. client ID.
The new treatment explicitly discusses the role of EXCHANGE_ID in The new treatment explicitly discusses the role of EXCHANGE_ID in
associating the client ID with the connection so it can be used by associating the client ID with the connection so it can be used by
CREATE_SESSION and in associating a connection with an existing CREATE_SESSION and in associating a connection with an existing
session. session.
The new treatment can be found in Section 13 below. It is intended The new treatment can be found in Section 14 below. It is intended
to supersede the treatment in Section 18.35 of [RFC5661]. Publishing to supersede the treatment in Section 18.35 of [RFC5661]. Publishing
a complete replacement for Section 18.35 allows the corrected a complete replacement for Section 18.35 allows the corrected
definition to be read as a whole once [RFC5661] is updated definition to be read as a whole once [RFC5661] is updated
13. Operation 42: EXCHANGE_ID - Instantiate Client ID (as updated) 13.6. Revision to Treatment of RECLAIM_COMPLETE
The following changes were made to the treatment of RECLAIM_COMPLETE
in [RFC5661] to arrive at the treatment in Section 15.
o In a number of places the text is more explicit about the purpose
of rca_one_fs and its connection to file system migration.
o There is a discussion of situations in which either form of
RECLAIM_COMPLETE would need to be done.
o There is a discussion of interoperability issues that result from
implementations that may have arisen due to the lack of clarity of
the previous treatment of RECLAIM_COMPLETE.
13.7. Reclaim Errors (as updated)
These errors relate to the process of reclaiming locks after a server
restart or in connection with the migration of a file system (i.e. in
the case in which rca_one_fs is TRUE).
13.7.1. NFS4ERR_COMPLETE_ALREADY (as updated; Error Code 10054)
The client previously sent a successful RECLAIM_COMPLETE operation
specifying the same scope, whether that scope is global or for the
same file system in the case of a per-fs RECLAIM_COMPLETE. An
additional RECLAIM_COMPLETE operation is not necessary and results in
this error.
13.7.2. NFS4ERR_GRACE (as updated; Error Code 10013)
The server was in its recovery or grace period, with regard to the
file system object for which the lock was requested. The locking
request was not a reclaim request and so could not be granted during
that period.
13.7.3. NFS4ERR_NO_GRACE (as updated; Error Code 10033)
A reclaim of client state was attempted in circumstances in which the
server cannot guarantee that conflicting state has not been provided
to another client. This can occur because the reclaim has been done
outside of a grace period of implemented by the server, after the
client has done a RECLAIM_COMPLETE operation which ends its ability
to reclaim the requested lock, or because previous operations have
created a situation in which the server is not able to determine that
a reclaim-interfering edge condition does not exist.
13.7.4. NFS4ERR_RECLAIM_BAD (as updated; Error Code 10034)
The server has determined that a reclaim attempted by the client is
not valid, i.e. the lock specified as being reclaimed could not
possibly have existed before the server restart or file system
migration event. A server is not obliged to make this determination
and will typically rely on the client to only reclaim locks that the
client was granted prior to restart or file system migration.
However, when a server does have reliable information to enable it
make this determination, this error indicates that the reclaim has
been rejected as invalid. This is as opposed to the error
NFS4ERR_RECLAIM_CONFLICT (see Section 13.7.5) where the server can
only determine that there has been an invalid reclaim, but cannot
determine which request is invalid.
13.7.5. NFS4ERR_RECLAIM_CONFLICT (as updated; Error Code 10035)
The reclaim attempted by the client has encountered a conflict and
cannot be satisfied. Potentially indicates a misbehaving client,
although not necessarily the one receiving the error. The
misbehavior might be on the part of the client that established the
lock with which this client conflicted. See also Section 13.7.4 for
the related error, NFS4ERR_RECLAIM_BAD.
14. Operation 42: EXCHANGE_ID - Instantiate Client ID (as updated)
The EXCHANGE_ID exchanges long-hand client and server identifiers The EXCHANGE_ID exchanges long-hand client and server identifiers
(owners), and provides access to a client ID, creating one if (owners), and provides access to a client ID, creating one if
necessary. This client ID becomes associated with the connection on necessary. This client ID becomes associated with the connection on
which the operation is done, so that it is available when a which the operation is done, so that it is available when a
CREATE_SESSION is done or when the connection is used to issue a CREATE_SESSION is done or when the connection is used to issue a
request on an existing session associated with the current client. request on an existing session associated with the current client.
13.1. ARGUMENT 14.1. ARGUMENT
<CODE BEGINS>
const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001;
const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002;
const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100;
const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000;
const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000;
const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000;
skipping to change at page 51, line 24 skipping to change at page 69, line 42
ssv_sp_parms4 spa_ssv_parms; ssv_sp_parms4 spa_ssv_parms;
}; };
struct EXCHANGE_ID4args { struct EXCHANGE_ID4args {
client_owner4 eia_clientowner; client_owner4 eia_clientowner;
uint32_t eia_flags; uint32_t eia_flags;
state_protect4_a eia_state_protect; state_protect4_a eia_state_protect;
nfs_impl_id4 eia_client_impl_id<1>; nfs_impl_id4 eia_client_impl_id<1>;
}; };
13.2. RESULT <CODE ENDS>
14.2. RESULT
<CODE BEGINS>
struct ssv_prot_info4 { struct ssv_prot_info4 {
state_protect_ops4 spi_ops; state_protect_ops4 spi_ops;
uint32_t spi_hash_alg; uint32_t spi_hash_alg;
uint32_t spi_encr_alg; uint32_t spi_encr_alg;
uint32_t spi_ssv_len; uint32_t spi_ssv_len;
uint32_t spi_window; uint32_t spi_window;
gsshandle4_t spi_handles<>; gsshandle4_t spi_handles<>;
}; };
union state_protect4_r switch(state_protect_how4 spr_how) { union state_protect4_r switch(state_protect_how4 spr_how) {
skipping to change at page 52, line 40 skipping to change at page 70, line 42
}; };
union EXCHANGE_ID4res switch (nfsstat4 eir_status) { union EXCHANGE_ID4res switch (nfsstat4 eir_status) {
case NFS4_OK: case NFS4_OK:
EXCHANGE_ID4resok eir_resok4; EXCHANGE_ID4resok eir_resok4;
default: default:
void; void;
}; };
13.3. DESCRIPTION <CODE ENDS>
14.3. DESCRIPTION
The client uses the EXCHANGE_ID operation to register a particular The client uses the EXCHANGE_ID operation to register a particular
client_owner with the server. However, when the client_owner has client_owner with the server. However, when the client_owner has
been already been registered by other means (e.g. Transparent State been already been registered by other means (e.g. Transparent State
Migration), the client may still use EXCHANGE_ID to obtain the client Migration), the client may still use EXCHANGE_ID to obtain the client
ID assigned previously. ID assigned previously.
The client ID returned from this operation will be associated with The client ID returned from this operation will be associated with
the connection on which the EXHANGE_ID is received and will serve as the connection on which the EXHANGE_ID is received and will serve as
a parent object for sessions created by the client on this connection a parent object for sessions created by the client on this connection
skipping to change at page 53, line 41 skipping to change at page 71, line 46
The eia_clientowner field is composed of a co_verifier field and a The eia_clientowner field is composed of a co_verifier field and a
co_ownerid string. As noted in section 2.4 of [RFC5661], the co_ownerid string. As noted in section 2.4 of [RFC5661], the
co_ownerid describes the client, and the co_verifier is the co_ownerid describes the client, and the co_verifier is the
incarnation of the client. An EXCHANGE_ID sent with a new incarnation of the client. An EXCHANGE_ID sent with a new
incarnation of the client will lead to the server removing lock state incarnation of the client will lead to the server removing lock state
of the old incarnation. Whereas an EXCHANGE_ID sent with the current of the old incarnation. Whereas an EXCHANGE_ID sent with the current
incarnation and co_ownerid will result in an error or an update of incarnation and co_ownerid will result in an error or an update of
the client ID's properties, depending on the arguments to the client ID's properties, depending on the arguments to
EXCHANGE_ID. EXCHANGE_ID.
A server MUST NOT use the same client ID for two different A server MUST NOT provide the same client ID to two different
incarnations of an eir_clientowner. incarnations of an eir_clientowner.
In addition to the client ID and sequence ID, the server returns a In addition to the client ID and sequence ID, the server returns a
server owner (eir_server_owner) and server scope (eir_server_scope). server owner (eir_server_owner) and server scope (eir_server_scope).
The former field is used for network trunking as described in The former field is used in connection with network trunking as
Section 2.10.54 of [RFC5661]. The latter field is used to allow described in Section 2.10.54 of [RFC5661]. The latter field is used
clients to determine when client IDs sent by one server may be to allow clients to determine when client IDs sent by one server may
recognized by another in the event of file system migration (see be recognized by another in the event of file system migration (see
Section 8.9 of the current document). Section 8.9 of the current document).
The client ID returned by EXCHANGE_ID is only unique relative to the The client ID returned by EXCHANGE_ID is only unique relative to the
combination of eir_server_owner.so_major_id and eir_server_scope. combination of eir_server_owner.so_major_id and eir_server_scope.
Thus, if two servers return the same client ID, the onus is on the Thus, if two servers return the same client ID, the onus is on the
client to distinguish the client IDs on the basis of client to distinguish the client IDs on the basis of
eir_server_owner.so_major_id and eir_server_scope. In the event two eir_server_owner.so_major_id and eir_server_scope. In the event two
different servers claim matching server_owner.so_major_id and different servers claim matching server_owner.so_major_id and
eir_server_scope, the client can use the verification techniques eir_server_scope, the client can use the verification techniques
discussed in Section 2.10.5 of [RFC5661] to determine if the servers discussed in Section 2.10.5 of [RFC5661] to determine if the servers
skipping to change at page 54, line 39 skipping to change at page 72, line 42
* EXCHGID4_FLAG_SUPP_MOVED_MIGR * EXCHGID4_FLAG_SUPP_MOVED_MIGR
* EXCHGID4_FLAG_BIND_PRINC_STATEID * EXCHGID4_FLAG_BIND_PRINC_STATEID
* EXCHGID4_FLAG_USE_NON_PNFS * EXCHGID4_FLAG_USE_NON_PNFS
* EXCHGID4_FLAG_USE_PNFS_MDS * EXCHGID4_FLAG_USE_PNFS_MDS
* EXCHGID4_FLAG_USE_PNFS_DS * EXCHGID4_FLAG_USE_PNFS_DS
These properties may be updated by subsequent EXCHANGE_ID requests These properties may be updated by subsequent EXCHANGE_ID
on confirmed client IDs though the server MAY refuse to change operations on confirmed client IDs though the server MAY refuse to
them. change them.
o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, o The state protection method used, one of SP4_NONE, SP4_MACH_CRED,
or SP4_SSV, as set by the spa_how field of the arguments to or SP4_SSV, as set by the spa_how field of the arguments to
EXCHANGE_ID. Once the client ID is confirmed, this property EXCHANGE_ID. Once the client ID is confirmed, this property
cannot be updated by subsequent EXCHANGE_ID requests. cannot be updated by subsequent EXCHANGE_ID operations.
o For SP4_MACH_CRED or SP4_SSV state protection: o For SP4_MACH_CRED or SP4_SSV state protection:
* The list of operations (spo_must_enforce) that MUST use the * The list of operations (spo_must_enforce) that MUST use the
specified state protection. This list comes from the results specified state protection. This list comes from the results
of EXCHANGE_ID. of EXCHANGE_ID.
* The list of operations (spo_must_allow) that MAY use the * The list of operations (spo_must_allow) that MAY use the
specified state protection. This list comes from the results specified state protection. This list comes from the results
of EXCHANGE_ID. of EXCHANGE_ID.
skipping to change at page 55, line 28 skipping to change at page 73, line 32
* The OID of the encryption algorithm. This property is * The OID of the encryption algorithm. This property is
represented by one of the algorithms in the ssp_encr_algs field represented by one of the algorithms in the ssp_encr_algs field
of the EXCHANGE_ID arguments. Once the client ID is confirmed, of the EXCHANGE_ID arguments. Once the client ID is confirmed,
this property cannot be updated by subsequent EXCHANGE_ID this property cannot be updated by subsequent EXCHANGE_ID
requests. requests.
* The length of the SSV. This property is represented by the * The length of the SSV. This property is represented by the
spi_ssv_len field in the EXCHANGE_ID results. Once the client spi_ssv_len field in the EXCHANGE_ID results. Once the client
ID is confirmed, this property cannot be updated by subsequent ID is confirmed, this property cannot be updated by subsequent
EXCHANGE_ID requests. EXCHANGE_ID operations.
There are REQUIRED and RECOMMENDED relationships among the There are REQUIRED and RECOMMENDED relationships among the
length of the key of the encryption algorithm ("key length"), length of the key of the encryption algorithm ("key length"),
the length of the output of hash algorithm ("hash length"), and the length of the output of hash algorithm ("hash length"), and
the length of the SSV ("SSV length"). the length of the SSV ("SSV length").
+ key length MUST be <= hash length. This is because the keys + key length MUST be <= hash length. This is because the keys
used for the encryption algorithm are actually subkeys used for the encryption algorithm are actually subkeys
derived from the SSV, and the derivation is via the hash derived from the SSV, and the derivation is via the hash
algorithm. The selection of an encryption algorithm with a algorithm. The selection of an encryption algorithm with a
skipping to change at page 56, line 14 skipping to change at page 74, line 17
+ key length SHOULD be >= hash length / 2. This is because + key length SHOULD be >= hash length / 2. This is because
the subkey derivation is via an HMAC and it is recommended the subkey derivation is via an HMAC and it is recommended
that if the HMAC has to be truncated, it should not be that if the HMAC has to be truncated, it should not be
truncated to less than half the hash length (see Section 4 truncated to less than half the hash length (see Section 4
of RFC2104 [RFC2104]). of RFC2104 [RFC2104]).
* Number of concurrent versions of the SSV the client and server * Number of concurrent versions of the SSV the client and server
will support (see Section 2.10.9 of [RFC5661]). This property will support (see Section 2.10.9 of [RFC5661]). This property
is represented by spi_window in the EXCHANGE_ID results. The is represented by spi_window in the EXCHANGE_ID results. The
property may be updated by subsequent EXCHANGE_ID requests. property may be updated by subsequent EXCHANGE_ID operations.
o The client's implementation ID as represented by the o The client's implementation ID as represented by the
eia_client_impl_id field of the arguments. The property may be eia_client_impl_id field of the arguments. The property may be
updated by subsequent EXCHANGE_ID requests. updated by subsequent EXCHANGE_ID requests.
o The server's implementation ID as represented by the o The server's implementation ID as represented by the
eir_server_impl_id field of the reply. The property may be eir_server_impl_id field of the reply. The property may be
updated by replies to subsequent EXCHANGE_ID requests. updated by replies to subsequent EXCHANGE_ID requests.
The eia_flags passed as part of the arguments and the eir_flags The eia_flags passed as part of the arguments and the eir_flags
skipping to change at page 56, line 51 skipping to change at page 75, line 5
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means
that the client is attempting to update properties of an existing that the client is attempting to update properties of an existing
confirmed client ID (if the client wants to update properties of an confirmed client ID (if the client wants to update properties of an
unconfirmed client ID, it MUST NOT set unconfirmed client ID, it MUST NOT set
EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED that EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED that
the client send the update EXCHANGE_ID operation in the same COMPOUND the client send the update EXCHANGE_ID operation in the same COMPOUND
as a SEQUENCE so that the EXCHANGE_ID is executed exactly once. as a SEQUENCE so that the EXCHANGE_ID is executed exactly once.
Whether the client can update the properties of client ID depends on Whether the client can update the properties of client ID depends on
the state protection it selected when the client ID was created, and the state protection it selected when the client ID was created, and
the principal and security flavor it uses when sending the the principal and security flavor it uses when sending the
EXCHANGE_ID request. The situations described in items 6, 7, 8, or 9 EXCHANGE_ID operation. The situations described in items 6, 7, 8, or
of the second numbered list of Section 13.4 below will apply. Note 9 of the second numbered list of Section 14.4 below will apply. Note
that if the operation succeeds and returns a client ID that is that if the operation succeeds and returns a client ID that is
already confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R already confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R
bit in eir_flags. bit in eir_flags.
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this
means that the client is trying to establish a new client ID; it is means that the client is trying to establish a new client ID; it is
attempting to trunk data communication to the server (See attempting to trunk data communication to the server (See
Section 2.10.5 of [RFC5661]); or it is attempting to update Section 2.10.5 of [RFC5661]); or it is attempting to update
properties of an unconfirmed client ID. The situations described in properties of an unconfirmed client ID. The situations described in
items 1, 2, 3, 4, or 5 of the second numbered list of Section 13.4 items 1, 2, 3, 4, or 5 of the second numbered list of Section 14.4
below will apply. Note that if the operation succeeds and returns a below will apply. Note that if the operation succeeds and returns a
client ID that was previously confirmed, the server MUST set the client ID that was previously confirmed, the server MUST set the
EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags.
When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client
indicates that it is capable of dealing with an NFS4ERR_MOVED error indicates that it is capable of dealing with an NFS4ERR_MOVED error
as part of a referral sequence. When this bit is not set, it is as part of a referral sequence. When this bit is not set, it is
still legal for the server to perform a referral sequence. However, still legal for the server to perform a referral sequence. However,
a server may use the fact that the client is incapable of correctly a server may use the fact that the client is incapable of correctly
responding to a referral, by avoiding it for that particular client. responding to a referral, by avoiding it for that particular client.
skipping to change at page 58, line 33 skipping to change at page 76, line 36
The spa_how field of the eia_state_protect field specifies how the The spa_how field of the eia_state_protect field specifies how the
client wants to protect its client, locking, and session states from client wants to protect its client, locking, and session states from
unauthorized changes (Section 2.10.8.3 of [RFC5661]): unauthorized changes (Section 2.10.8.3 of [RFC5661]):
o SP4_NONE. The client does not request the NFSv4.1 server to o SP4_NONE. The client does not request the NFSv4.1 server to
enforce state protection. The NFSv4.1 server MUST NOT enforce enforce state protection. The NFSv4.1 server MUST NOT enforce
state protection for the returned client ID. state protection for the returned client ID.
o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST
send the EXCHANGE_ID request with RPCSEC_GSS as the security send the EXCHANGE_ID operation with RPCSEC_GSS as the security
flavor, and with a service of RPC_GSS_SVC_INTEGRITY or flavor, and with a service of RPC_GSS_SVC_INTEGRITY or
RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the
client wants to use an RPCSEC_GSS-based machine credential to client wants to use an RPCSEC_GSS-based machine credential to
protect its state. The server MUST note the principal the protect its state. The server MUST note the principal the
EXCHANGE_ID operation was sent with, and the GSS mechanism used. EXCHANGE_ID operation was sent with, and the GSS mechanism used.
These notes collectively comprise the machine credential. These notes collectively comprise the machine credential.
After the client ID is confirmed, as long as the lease associated After the client ID is confirmed, as long as the lease associated
with the client ID is unexpired, a subsequent EXCHANGE_ID with the client ID is unexpired, a subsequent EXCHANGE_ID
operation that uses the same eia_clientowner.co_owner as the first operation that uses the same eia_clientowner.co_owner as the first
EXCHANGE_ID MUST also use the same machine credential as the first EXCHANGE_ID MUST also use the same machine credential as the first
EXCHANGE_ID. The server returns the same client ID for the EXCHANGE_ID. The server returns the same client ID for the
subsequent EXCHANGE_ID as that returned from the first subsequent EXCHANGE_ID as that returned from the first
EXCHANGE_ID. EXCHANGE_ID.
o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the
EXCHANGE_ID request with RPCSEC_GSS as the security flavor, and EXCHANGE_ID operation with RPCSEC_GSS as the security flavor, and
with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY.
If SP4_SSV is specified, then the client wants to use the SSV to If SP4_SSV is specified, then the client wants to use the SSV to
protect its state. The server records the credential used in the protect its state. The server records the credential used in the
request as the machine credential (as defined above) for the request as the machine credential (as defined above) for the
eia_clientowner.co_owner. The CREATE_SESSION operation that eia_clientowner.co_owner. The CREATE_SESSION operation that
confirms the client ID MUST use the same machine credential. confirms the client ID MUST use the same machine credential.
When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides
two lists of operations (each expressed as a bitmap). The first list two lists of operations (each expressed as a bitmap). The first list
is spo_must_enforce and consists of those operations the client MUST is spo_must_enforce and consists of those operations the client MUST
send (subject to the server confirming the list of operations in the send (subject to the server confirming the list of operations in the
skipping to change at page 61, line 49 skipping to change at page 80, line 5
ssp_num_gss_handles to zero; the client can create more handles ssp_num_gss_handles to zero; the client can create more handles
with another EXCHANGE_ID call. with another EXCHANGE_ID call.
Because each SSV RPCSEC_GSS handle shares a common SSV GSS Because each SSV RPCSEC_GSS handle shares a common SSV GSS
context, there are security considerations specific to this context, there are security considerations specific to this
situation discussed in Section 2.10.10 of [RFC5661]. situation discussed in Section 2.10.10 of [RFC5661].
The seq_window (see Section 5.2.3.1 of [RFC2203]) of each The seq_window (see Section 5.2.3.1 of [RFC2203]) of each
RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window
of the RPCSEC_GSS handle used for the credential of the RPC of the RPCSEC_GSS handle used for the credential of the RPC
request that the EXCHANGE_ID request was sent with. request that the EXCHANGE_ID operation was sent as a part of.
+-------------------+----------------------+------------------------+ +-------------------+----------------------+------------------------+
| Encryption | MUST NOT be combined | SHOULD NOT be combined | | Encryption | MUST NOT be combined | SHOULD NOT be combined |
| Algorithm | with | with | | Algorithm | with | with |
+-------------------+----------------------+------------------------+ +-------------------+----------------------+------------------------+
| id-aes128-CBC | | id-sha384, id-sha512 | | id-aes128-CBC | | id-sha384, id-sha512 |
| id-aes192-CBC | id-sha1 | id-sha512 | | id-aes192-CBC | id-sha1 | id-sha512 |
| id-aes256-CBC | id-sha1, id-sha224 | | | id-aes256-CBC | id-sha1, id-sha224 | |
+-------------------+----------------------+------------------------+ +-------------------+----------------------+------------------------+
skipping to change at page 62, line 43 skipping to change at page 80, line 45
peer's manifesting a particular allowed behavior based on an peer's manifesting a particular allowed behavior based on an
implementation identifier but are required to interoperate as implementation identifier but are required to interoperate as
specified elsewhere in the protocol specification. specified elsewhere in the protocol specification.
Because it is possible that some implementations might violate the Because it is possible that some implementations might violate the
protocol specification and interpret the identity information, protocol specification and interpret the identity information,
implementations MUST provide facilities to allow the NFSv4 client and implementations MUST provide facilities to allow the NFSv4 client and
server be configured to set the contents of the nfs_impl_id server be configured to set the contents of the nfs_impl_id
structures sent to any specified value. structures sent to any specified value.
13.4. IMPLEMENTATION 14.4. IMPLEMENTATION
A server's client record is a 5-tuple: A server's client record is a 5-tuple:
1. co_ownerid 1. co_ownerid
The client identifier string, from the eia_clientowner The client identifier string, from the eia_clientowner
structure of the EXCHANGE_ID4args structure. structure of the EXCHANGE_ID4args structure.
2. co_verifier: 2. co_verifier:
A client-specific value used to indicate incarnations (where a A client-specific value used to indicate incarnations (where a
client restart represents a new incarnation), from the client restart represents a new incarnation), from the
eia_clientowner structure of the EXCHANGE_ID4args structure. eia_clientowner structure of the EXCHANGE_ID4args structure.
3. principal: 3. principal:
skipping to change at page 67, line 38 skipping to change at page 85, line 42
+ If the server subsequently receives a successful + If the server subsequently receives a successful
CREATE_SESSION that confirms clientid_ret, then the server CREATE_SESSION that confirms clientid_ret, then the server
atomically destroys the confirmed record and makes the atomically destroys the confirmed record and makes the
unconfirmed record confirmed as described in section unconfirmed record confirmed as described in section
16.36.3 of [RFC5661]. 16.36.3 of [RFC5661].
+ If the server instead subsequently receives an EXCHANGE_ID + If the server instead subsequently receives an EXCHANGE_ID
with the client owner equal to ownerid_arg, one strategy is with the client owner equal to ownerid_arg, one strategy is
to simply delete the unconfirmed record, and process the to simply delete the unconfirmed record, and process the
EXCHANGE_ID as described in the entirety of Section 13.4. EXCHANGE_ID as described in the entirety of Section 14.4.
6. Update 6. Update
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server
has the following confirmed record, then this request is an has the following confirmed record, then this request is an
attempt at an update. attempt at an update.
{ ownerid_arg, verifier_arg, principal_arg, clientid_ret, { ownerid_arg, verifier_arg, principal_arg, clientid_ret,
confirmed } confirmed }
Since the record has been confirmed, the client must have Since the record has been confirmed, the client must have
received the server's reply from the initial EXCHANGE_ID received the server's reply from the initial EXCHANGE_ID
request. The server allows the update, and the client record request. The server allows the update, and the client record
is left intact. is left intact.
7. Update but No Confirmed Record 7. Update but No Confirmed Record
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server
has no confirmed record corresponding ownerid_arg, then the has no confirmed record corresponding ownerid_arg, then the
server returns NFS4ERR_NOENT and leaves any unconfirmed record server returns NFS4ERR_NOENT and leaves any unconfirmed record
skipping to change at page 68, line 40 skipping to change at page 86, line 44
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server
has the following confirmed record, then this request is an has the following confirmed record, then this request is an
illegal attempt at an update by an unauthorized principal. illegal attempt at an update by an unauthorized principal.
{ ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret,
confirmed } confirmed }
The server returns NFS4ERR_PERM and leaves the client record The server returns NFS4ERR_PERM and leaves the client record
intact. intact.
14. Security Considerations 15. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished (as
updated)
15.1. ARGUMENT
<CODE BEGINS>
struct RECLAIM_COMPLETE4args {
/*
* If rca_one_fs TRUE,
*
* CURRENT_FH: object in
* file system reclaim is
* complete for.
*/
bool rca_one_fs;
};
<CODE ENDS>
15.2. RESULTS
<CODE BEGINS>
struct RECLAIM_COMPLETE4res {
nfsstat4 rcr_status;
};
<CODE ENDS>
15.3. DESCRIPTION
A RECLAIM_COMPLETE operation is used to indicate that the client has
reclaimed all of the locking state that it will recover using
reclaim, when it is recovering state due to either a server restart
or the migration of a file system to another server. There are two
types of RECLAIM_COMPLETE operations:
o When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being done.
This indicates that recovery of all locks that the client held on
the previous server instance have been completed. The current
filehandle need not be set in this case.
o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE
is being done. This indicates that recovery of locks for a single
fs (the one designated by the current filehandle) due to the
migration of the file system has been completed. Presence of a
current filehandle is required when rca_one_fs is set to TRUE.
When the current filehandle designates a filehandle in a file
system not in the process of migration, the operation returns
NFS4_OK and is otherwise ignored.
Once a RECLAIM_COMPLETE is done, there can be no further reclaim
operations for locks whose scope is defined as having completed
recovery. Once the client sends RECLAIM_COMPLETE, the server will
not allow the client to do subsequent reclaims of locking state for
that scope and, if these are attempted, will return NFS4ERR_NO_GRACE.
Whenever a client establishes a new client ID and before it does the
first non-reclaim operation that obtains a lock, it MUST send a
RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no
locks to reclaim. If non-reclaim locking operations are done before
the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned.
Similarly, when the client accesses a migrated file system on a new
server, before it sends the first non-reclaim operation that obtains
a lock on this new server, it MUST send a RECLAIM_COMPLETE with
rca_one_fs set to TRUE and current filehandle within that file
system, even if there are no locks to reclaim. If non-reclaim
locking operations are done on that file system before the
RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned.
It should be noted that there are situations in which a client needs
to issue both forms of RECLAIM_COMPLETE. An example is an instance
of file system migration in which the file system is migrated to a
server for which the client has no clientid. As a result, the client
needs to obtain a clientid from the server (incurring the
responsibility to do RECLAIM_COMPLETE with rca_one_fs set to FALSE)
as well as RECLAIM_COMPLETE with rca_one_fs set to TRUE to complete
the per-fs grace period associated with the file system migration.
Any locks not reclaimed at the point at which RECLAIM_COMPLETE is
done become non-reclaimable. The client MUST NOT attempt to reclaim
them, either during the current server instance or in any subsequent
server instance, or on another server to which responsibility for
that file system is transferred. If the client were to do so, it
would be violating the protocol by representing itself as owning
locks that it does not own, and so has no right to reclaim. See
Section 8.4.3 of [RFC5661] for a discussion of edge conditions
related to lock reclaim.
By sending a RECLAIM_COMPLETE, the client indicates readiness to
proceed to do normal non-reclaim locking operations. The client
should be aware that such operations may temporarily result in
NFS4ERR_GRACE errors until the server is ready to terminate its grace
period.
15.4. IMPLEMENTATION
Servers will typically use the information as to when reclaim
activity is complete to reduce the length of the grace period. When
the server maintains in persistent storage a list of clients that
might have had locks, it is able to use the fact that all such
clients have done a RECLAIM_COMPLETE to terminate the grace period
and begin normal operations (i.e., grant requests for new locks)
sooner than it might otherwise.
Latency can be minimized by doing a RECLAIM_COMPLETE as part of the
COMPOUND request in which the last lock-reclaiming operation is done.
When there are no reclaims to be done, RECLAIM_COMPLETE should be
done immediately in order to allow the grace period to end as soon as
possible.
RECLAIM_COMPLETE should only be done once for each server instance or
occasion of the transition of a file system. If it is done a second
time, the error NFS4ERR_COMPLETE_ALREADY will result. Note that
because of the session feature's retry protection, retries of
COMPOUND requests containing RECLAIM_COMPLETE operation will not
result in this error.
When a RECLAIM_COMPLETE is sent, the client effectively acknowledges
any locks not yet reclaimed as lost. This allows the server to re-
enable the client to recover locks if the occurrence of edge
conditions, as described in Section 8.4.3 of [RFC5661], had caused
the server to disable the client's ability to recover locks.
Because previous descriptions of RECLAIM_COMPLETE were not
sufficiently explicit about the circumstances in which use of
RECLAIM_COMPLETE with rca_one_fs set to TRUE was appropriate, there
have been cases which it has been misused by clients, and cases in
which servers have, in various ways, not responded to such misuse as
described above. While clients SHOULD NOT misuse this feature and
servers SHOULD respond to such misuse as described above,
implementers need to be aware of the following considerations as they
make necessary tradeoffs between interoperability with existing
implementations and proper support for facilities to allow lock
recovery in the event of file system migration.
o When servers have no support for becoming the destination server
of a file system subject to migration, there is no possibility of
a per-fs RECLAIM_COMPLETE being done legitimately and occurrences
of it SHOULD be ignored. However, the negative consequences of
accepting mistaken use are quite limited as long as the does not
issue it before all necessary reclaims are done.
o When a server might become the destination for a file system being
migrated, inappropriate use per-fs RECLAIM_COMPLETE is more
concerning. In the case in which the file system designated is
not within a per-fs grace period, it SHOULD be ignored, with the
negative consequences of accepting it being limited, as in the
case in which migration is not supported. However, if it should
encounter a file system undergoing migration, it cannot be
accepted as if it were a global RECLAIM_COMPLETE without
invalidating its intended use.
16. Security Considerations
The Security Considerations section of [RFC5661] needs the additions The Security Considerations section of [RFC5661] needs the additions
below to properly address some aspects of trunking discovery, below to properly address some aspects of trunking discovery,
referral, migration and replication. referral, migration and replication.
The possibility that requests to determine the set of network The possibility that requests to determine the set of network
addresses corresponding to a given server might be interfered with addresses corresponding to a given server might be interfered with
or have their responses corrupted needs to be taken into account. or have their responses corrupted needs to be taken into account.
In light of this, the following considerations should be taken In light of this, the following considerations should be taken
note of: note of:
skipping to change at page 70, line 45 skipping to change at page 92, line 19
o The use of requests issued without RPCSEC_GSS (i.e. using o The use of requests issued without RPCSEC_GSS (i.e. using
AUTH_SYS), while undesirable, may not be avoidable in all AUTH_SYS), while undesirable, may not be avoidable in all
cases. Where the use of the returned information cannot be cases. Where the use of the returned information cannot be
avoided, it should be subject to filtering to eliminate the avoided, it should be subject to filtering to eliminate the
possibility that the client would treat an invalid address as possibility that the client would treat an invalid address as
if it were a NFSv4 server. The specifics will vary depending if it were a NFSv4 server. The specifics will vary depending
on the degree of network isolation and whether the request is on the degree of network isolation and whether the request is
to the referring or destination servers. to the referring or destination servers.
15. IANA Considerations 17. IANA Considerations
This document does not require actions by IANA. This document does not require actions by IANA.
16. References 18. References
16.1. Normative References 18.1. Normative References
[CSOR_AES] [CSOR_AES]
National Institute of Standards and Technology, National Institute of Standards and Technology,
"Cryptographic Algorithm Object Registration", URL "Cryptographic Algorithm Object Registration", URL
http://csrc.nist.gov/groups/ST/crypto_apps_infra/csor/ http://csrc.nist.gov/groups/ST/crypto_apps_infra/csor/
algorithms.html, November 2007. algorithms.html, November 2007.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
skipping to change at page 72, line 19 skipping to change at page 93, line 36
[RFC7931] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, [RFC7931] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker,
"NFSv4.0 Migration: Specification Update", RFC 7931, "NFSv4.0 Migration: Specification Update", RFC 7931,
DOI 10.17487/RFC7931, July 2016, DOI 10.17487/RFC7931, July 2016,
<https://www.rfc-editor.org/info/rfc7931>. <https://www.rfc-editor.org/info/rfc7931>.
[RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct [RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct
Memory Access Transport for Remote Procedure Call Version Memory Access Transport for Remote Procedure Call Version
1", RFC 8166, DOI 10.17487/RFC8166, June 2017, 1", RFC 8166, DOI 10.17487/RFC8166, June 2017,
<https://www.rfc-editor.org/info/rfc8166>. <https://www.rfc-editor.org/info/rfc8166>.
16.2. Informative References [RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor
Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017,
<https://www.rfc-editor.org/info/rfc8178>.
18.2. Informative References
[I-D.cel-nfsv4-mv0-trunking-update] [I-D.cel-nfsv4-mv0-trunking-update]
Lever, C. and D. Noveck, "NFS version 4.0 Trunking Lever, C. and D. Noveck, "NFS version 4.0 Trunking
Update", draft-cel-nfsv4-mv0-trunking-update-00 (work in Update", draft-cel-nfsv4-mv0-trunking-update-00 (work in
progress), November 2017. progress), November 2017.
[RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
Hashing for Message Authentication", RFC 2104, Hashing for Message Authentication", RFC 2104,
DOI 10.17487/RFC2104, February 1997, DOI 10.17487/RFC2104, February 1997,
<https://www.rfc-editor.org/info/rfc2104>. <https://www.rfc-editor.org/info/rfc2104>.
skipping to change at page 73, line 20 skipping to change at page 94, line 42
o Section 4.5.7 is an additional section. o Section 4.5.7 is an additional section.
o Section 5 is explanatory. o Section 5 is explanatory.
o Sections 6 and 7 are additional sections. o Sections 6 and 7 are additional sections.
o Sections 8 through 8.9, a total of ten sections, are all o Sections 8 through 8.9, a total of ten sections, are all
replacement sections. replacement sections.
o Sections 9 through 11.2, a total of eleven sections, are all o Sections 9 through 11.3, a total of twelve sections, are all
additional sections. additional sections.
o Section 12 is explanatory. o Section 12.1 is explanatory.
o Sections 12.1 and 12.2 are replacement sections. o Sections 12.2 throuhy 12.2.3, a total of four sections, are all
replacemebt sections.
o Sections 12.3 and 12.4 are editing sections. o Section 13 is explanatory.
o Section 12.5 is explanatory. o Sections 13.1 and 13.2 are replacement sections.
o Section 13 is a replacement section, which consists of a total of o Sections 13.3 and 13.4 are editing sections.
o Sections 13.5 and 13.6 is explanatory.
o Section 13.7 is a replcement section, which consists of a total of
six sections.
o Section 14 is a replacement section, which consists of a total of
five sections. five sections.
o Section 14 is an editing section. o Section 15 is a replacement section, which consists of a total of
five sections.
o Section 15 through Acknowledgments, a total of six sections, are o Section 16 is an editing section.
o Section 17 through Acknowledgments, a total of six sections, are
all explanatory. all explanatory.
To summarize: To summarize:
o There are fifteen explanatory sections. o There are seventeen explanatory sections.
o There are twenty-two replacement sections. o There are thirty-seven replacement sections.
o There are eightteen additional sections. o There are eightteen additional sections.
o There are three editing sections. o There are three editing sections.
Appendix B. Updates to RFC5661 Appendix B. Updates to RFC5661
In this appendix, we proceed through [RFC5661] identifying sections In this appendix, we proceed through [RFC5661] identifying sections
as unchanged, modified, deleted, or replaced and indicating where as unchanged, modified, deleted, or replaced and indicating where
additional sections from the current document would appear in an additional sections from the current document would appear in an
eventual consolidated description of NFSv4.1. In this presentation, eventual consolidated description of NFSv4.1. In this presentation,
when section X is referred to, it denotes that section plus all when section X is referred to, it denotes that section plus all
included subsections. When it is necessary to refer to the part of a included subsections. When it is necessary to refer to the part of a
section outside any included subsections, the exclusion is noted section outside any included subsections, the exclusion is noted
explicitly. explicitly.
o Section 1 is unmodified except that Section 1.7.3.3 is to be o Section 1 is unmodified except that Section 1.7.3.3 is to be
replaced by Section 12.1 from the current document. replaced by Section 13.1 from the current document.
o Section 2 is unmodified except for the specific items listed o Section 2 is unmodified except for the specific items listed
below: below:
o Section 2.10.4 is replaced by Section 12.2 from the current o Section 2.10.4 is replaced by Section 13.2 from the current
document. document.
o Section 2.10.5 is modified as discussed in Section 12.4 of the o Section 2.10.5 is modified as discussed in Section 13.4 of the
current document. current document.
o Sections 3 through 10 are unchanged. o Sections 3 through 10 are unchanged.
o Section 11 is extensively modified as discussed below. o Section 11 is extensively modified as discussed below.
o Section 11, exclusive of subsections, is replaced by Sections o Section 11, exclusive of subsections, is replaced by Sections
4.1 and 4.2 from the current document. 4.1 and 4.2 from the current document.
o Section 11.1 is replaced by Section 4.3 from the current o Section 11.1 is replaced by Section 4.3 from the current
skipping to change at page 75, line 39 skipping to change at page 97, line 18
o Section 11.7.7, exclusive of subsections, is replaced by o Section 11.7.7, exclusive of subsections, is replaced by
Section 8.9. Sections 11.7.7.1 and 11.7.72 are unchanged. Section 8.9. Sections 11.7.7.1 and 11.7.72 are unchanged.
o Section 11.7.8 is replaced by Section 8.6 o Section 11.7.8 is replaced by Section 8.6
o Section 11.7.9 is replaced by Section 8.7 o Section 11.7.9 is replaced by Section 8.7
o Section 11.7.10 is replaced by Section 8.8 o Section 11.7.10 is replaced by Section 8.8
o Sections 11.8, 11.8.1, 11.8.2, 11.9, 11.10, 11.10.1, 11.10.2, o Sections 11.8, 11.8.1, 11.8.2, and 11.9, are unchanged.
11.10.3, and 11.11 are unchanged.
o Sections 11.10, 11.10.1, 11.10.2, and 11.10.3 are replaced by
Sections 12.2 through 12.2.3.
o Section 11.11 is unchanged.
o New sections corresponding to Sections 9, 10, and 11 from the o New sections corresponding to Sections 9, 10, and 11 from the
current document appear next as additional sub-sections of current document appear next as additional sub-sections of
Section 11. Each of these has subsections, so there is a total Section 11. Each of these has subsections, so there is a total
of seventeen sections added. of seventeen sections added.
o Sections 12 through 14 are unchanged. o Sections 12 through 14 are unchanged.
o Section 15 is unmodified except that the description of o Section 15 is unmodified except that
NFS4ERR_MOVED in Section 15.1 is revised as described in
Section 12.3 of the current document. * The description of NFS4ERR_MOVED in Section 15.1 is revised as
described in Section 13.3 of the current document.
* The description of the reclaim-related errors in section 15.1.9
is replaced by the revised descriptions in Section 13.7 of the
current document.
o Sections 16 and 17 are unchanged. o Sections 16 and 17 are unchanged.
o Section 18 is unmodified except that section 18.35 is replaced by o Section 18 is unmodified except the
Section 13 in the current document.
* Section 18.35 is replaced by Section 14 in the current
document.
* Section 18.51 is replaced by Section 15 in the current
document.
o Sections 19 through 23 are unchanged. o Sections 19 through 23 are unchanged.
In terms of top-level sections, exclusive of appendices: In terms of top-level sections, exclusive of appendices:
o There is one heavily modified top-level section (Section 11) o There is one heavily modified top-level section (Section 11)
o There are four other modified top-level sections (Sections 1, 2, o There are four other modified top-level sections (Sections 1, 2,
15, and 18). 15, and 18).
skipping to change at page 76, line 40 skipping to change at page 98, line 33
o Sections outside Section 11. o Sections outside Section 11.
In this table, the counts for top-level sections and TOC entries are In this table, the counts for top-level sections and TOC entries are
for sections including subsections while other counts are for for sections including subsections while other counts are for
sections exclusive of included subsections. sections exclusive of included subsections.
+------------+------+------+--------+------------+--------+ +------------+------+------+--------+------------+--------+
| Status | Top | TOC | in 11 | not in 11 | Total | | Status | Top | TOC | in 11 | not in 11 | Total |
+------------+------+------+--------+------------+--------+ +------------+------+------+--------+------------+--------+
| Replaced | 0 | 3 | 17 | 7 | 24 | | Replaced | 0 | 6 | 21 | 15 | 36 |
| Added | 0 | 6 | 23 | 0 | 23 | | Added | 0 | 5 | 24 | 0 | 24 |
| Deleted | 0 | 1 | 4 | 0 | 4 | | Deleted | 0 | 1 | 4 | 0 | 4 |
| Modified | 5 | 4 | 0 | 2 | 2 | | Modified | 5 | 3 | 0 | 2 | 2 |
| Unchanged | 18 | 212 | 16 | 918 | 934 | | Unchanged | 18 | 210 | 12 | 910 | 922 |
| in RFC5661 | 23 | 220 | 37 | 927 | 964 | | in RFC5661 | 23 | 220 | 37 | 927 | 964 |
+------------+------+------+--------+------------+--------+ +------------+------+------+--------+------------+--------+
Acknowledgments Acknowledgments
The authors wish to acknowledge the important role of Andy Adamson of The authors wish to acknowledge the important role of Andy Adamson of
Netapp in clarifying the need for trunking discovery functionality, Netapp in clarifying the need for trunking discovery functionality,
and exploring the role of the location attributes in providing the and exploring the role of the location attributes in providing the
necessary support. necessary support.
The authors also wish to acknowledge the work of Xuan Qi of Oracle The authors also wish to acknowledge the work of Xuan Qi of Oracle
with NFSv4.1 client and server prototypes of transparent state with NFSv4.1 client and server prototypes of transparent state
migration functionality. migration functionality.
The authors wish to thank Trond Myklebust of Primary Data for his The authors wish to thank others that brought attention to important
comments related to trunking, helping to clarify the role of DNS in issues. The comments of Trond Myklebust of Primary Data related to
trunking discovery. trunking helped to clarify the role of DNS in trunking discovery.
Rick Macklem's comments brought attention to problems in the handling
of the per-fs version of RECLAIM_COMPLETE.
The authors wish to thank Olga Kornievskaia of Netapp for her helpful The authors wish to thank Olga Kornievskaia of Netapp for her helpful
review comments. review comments.
Authors' Addresses Authors' Addresses
David Noveck (editor) David Noveck (editor)
NetApp NetApp
1601 Trapelo Road 1601 Trapelo Road
Waltham, MA 02451 Waltham, MA 02451
 End of changes. 136 change blocks. 
280 lines changed or deleted 1327 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/