draft-ietf-nfsv4-mv1-msns-update-01.txt | draft-ietf-nfsv4-mv1-msns-update-02.txt | |||
---|---|---|---|---|
NFSv4 D. Noveck, Ed. | NFSv4 D. Noveck, Ed. | |||
Internet-Draft NetApp | Internet-Draft NetApp | |||
Updates: 5661 (if approved) C. Lever | Updates: 5661 (if approved) C. Lever | |||
Intended status: Standards Track ORACLE | Intended status: Standards Track ORACLE | |||
Expires: December 11, 2018 June 9, 2018 | Expires: April 24, 2019 October 21, 2018 | |||
NFSv4.1 Update for Multi-Server Namespace | NFS Version 4.1 Update for Multi-Server Namespace | |||
draft-ietf-nfsv4-mv1-msns-update-01 | draft-ietf-nfsv4-mv1-msns-update-02 | |||
Abstract | Abstract | |||
This document presents necessary clarifications and corrections | This document presents necessary clarifications and corrections | |||
concerning features related to the use of location-related attributes | concerning features related to the use of location-related attributes | |||
in NFSv4.1. These include migration, which transfers responsibility | in NFSv4.1. These include migration, which transfers responsibility | |||
for a file system from one server to another, and facilities to | for a file system from one server to another, and facilities to | |||
support trunking by allowing discovery of the set of network | support trunking by allowing discovery of the set of network | |||
addresses to use to access a file system. This document updates | addresses to use to access a file system. This document updates | |||
RFC5661. | RFC5661. | |||
skipping to change at page 1, line 37 ¶ | skipping to change at page 1, line 37 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on December 11, 2018. | This Internet-Draft will expire on April 24, 2019. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 | 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 | |||
3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 4 | 3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | 3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
3.2. Summary of Issues . . . . . . . . . . . . . . . . . . . . 6 | 3.2. Summary of Issues . . . . . . . . . . . . . . . . . . . . 7 | |||
3.3. Relationship of this Document to RFC5661 . . . . . . . . 8 | 3.3. Relationship of this Document to RFC5661 . . . . . . . . 9 | |||
4. Changes to Section 11 of RFC5661 . . . . . . . . . . . . . . 9 | 4. Changes to Section 11 of RFC5661 . . . . . . . . . . . . . . 10 | |||
4.1. Multi-Server Namespace (as updated) . . . . . . . . . . . 10 | 4.1. Multi-Server Namespace (as updated) . . . . . . . . . . . 11 | |||
4.2. Location-related Terminology (to be added) . . . . . . . 10 | 4.2. Location-related Terminology (to be added) . . . . . . . 11 | |||
4.3. Location Attributes (as updated) . . . . . . . . . . . . 12 | 4.3. Location Attributes (as updated) . . . . . . . . . . . . 13 | |||
4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 . . 13 | 4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 . . 14 | |||
4.5. Uses of Location Information (as updated) . . . . . . . . 13 | 4.5. Uses of Location Information (as updated) . . . . . . . . 14 | |||
4.5.1. Combining Multiple Uses in a Single Attribute (to be | 4.5.1. Combining Multiple Uses in a Single Attribute (to be | |||
added) . . . . . . . . . . . . . . . . . . . . . . . 14 | added) . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
4.5.2. Location Attributes and Trunking (to be added) . . . 15 | 4.5.2. Location Attributes and Trunking (to be added) . . . 16 | |||
4.5.3. Location Attributes and Connection Type Selection (to | 4.5.3. Location Attributes and Connection Type Selection (to | |||
be added) . . . . . . . . . . . . . . . . . . . . . . 15 | be added) . . . . . . . . . . . . . . . . . . . . . . 16 | |||
4.5.4. File System Replication (as updated) . . . . . . . . 16 | 4.5.4. File System Replication (as updated) . . . . . . . . 17 | |||
4.5.5. File System Migration (as updated) . . . . . . . . . 16 | 4.5.5. File System Migration (as updated) . . . . . . . . . 17 | |||
4.5.6. Referrals (as updated) . . . . . . . . . . . . . . . 17 | 4.5.6. Referrals (as updated) . . . . . . . . . . . . . . . 19 | |||
4.5.7. Changes in a Location Attribute (to be added) . . . . 19 | 4.5.7. Changes in a Location Attribute (to be added) . . . . 20 | |||
5. Re-organization of Section 11.7 of RFC5661 . . . . . . . . . 20 | 5. Re-organization of Section 11.7 of RFC5661 . . . . . . . . . 21 | |||
6. Overview of File Access Transitions (to be added) . . . . . . 20 | 6. Overview of File Access Transitions (to be added) . . . . . . 22 | |||
7. Effecting Network Endpoint Transitions (to be added) . . . . 21 | 7. Effecting Network Endpoint Transitions (to be added) . . . . 22 | |||
8. Effecting File System Transitions (as updated) . . . . . . . 22 | 8. Effecting File System Transitions (as updated) . . . . . . . 23 | |||
8.1. File System Transitions and Simultaneous Access (as | 8.1. File System Transitions and Simultaneous Access (as | |||
updated) . . . . . . . . . . . . . . . . . . . . . . . . 23 | updated) . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
8.2. Filehandles and File System Transitions (as updated) . . 23 | 8.2. Filehandles and File System Transitions (as updated) . . 24 | |||
8.3. Fileids and File System Transitions (as updated) . . . . 24 | 8.3. Fileids and File System Transitions (as updated) . . . . 25 | |||
8.4. Fsids and File System Transitions (as updated) . . . . . 25 | 8.4. Fsids and File System Transitions (as updated) . . . . . 26 | |||
8.4.1. File System Splitting (as updated) . . . . . . . . . 25 | 8.4.1. File System Splitting (as updated) . . . . . . . . . 26 | |||
8.5. The Change Attribute and File System Transitions (as | 8.5. The Change Attribute and File System Transitions (as | |||
updated) . . . . . . . . . . . . . . . . . . . . . . . . 26 | updated) . . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
8.6. Write Verifiers and File System Transitions (as updated) 26 | 8.6. Write Verifiers and File System Transitions (as updated) 27 | |||
8.7. Readdir Cookies and Verifiers and File System Transitions | 8.7. Readdir Cookies and Verifiers and File System Transitions | |||
(as updated) . . . . . . . . . . . . . . . . . . . . . . 26 | (as updated) . . . . . . . . . . . . . . . . . . . . . . 28 | |||
8.8. File System Data and File System Transitions (as updated) 27 | 8.8. File System Data and File System Transitions (as updated) 28 | |||
8.9. Lock State and File System Transitions (as updated) . . . 28 | 8.9. Lock State and File System Transitions (as updated) . . . 29 | |||
9. Transferring State upon Migration (to be added) . . . . . . . 29 | 9. Transferring State upon Migration (to be added) . . . . . . . 30 | |||
9.1. Transparent State Migration and pNFS (to be added) . . . 29 | 9.1. Transparent State Migration and pNFS (to be added) . . . 31 | |||
10. Client Responsibilities when Access is Transitioned (to be | 10. Client Responsibilities when Access is Transitioned (to be | |||
added) . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | added) . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 | |||
10.1. Client Transition Notifications (to be added) . . . . . 31 | 10.1. Client Transition Notifications (to be added) . . . . . 32 | |||
10.2. Performing Migration Discovery (to be added) . . . . . . 33 | 10.2. Performing Migration Discovery (to be added) . . . . . . 35 | |||
10.3. Overview of Client Response to NFS4ERR_MOVED (to be | 10.3. Overview of Client Response to NFS4ERR_MOVED (to be | |||
added) . . . . . . . . . . . . . . . . . . . . . . . . . 36 | added) . . . . . . . . . . . . . . . . . . . . . . . . . 37 | |||
10.4. Obtaining Access to Sessions and State after Migration | 10.4. Obtaining Access to Sessions and State after Migration | |||
(to be added) . . . . . . . . . . . . . . . . . . . . . 37 | (to be added) . . . . . . . . . . . . . . . . . . . . . 39 | |||
10.5. Obtaining Access to Sessions and State after Network | 10.5. Obtaining Access to Sessions and State after Network | |||
Address Transfer (to be added) . . . . . . . . . . . . . 39 | Address Transfer (to be added) . . . . . . . . . . . . . 41 | |||
11. Server Responsibilities Upon Migration (to be added) . . . . 40 | 11. Server Responsibilities Upon Migration (to be added) . . . . 41 | |||
11.1. Server Responsibilities in Effecting Transparent State | 11.1. Server Responsibilities in Effecting State Reclaim after | |||
Migration (to be added) . . . . . . . . . . . . . . . . 40 | Migration (to be added) . . . . . . . . . . . . . . . . 42 | |||
11.2. Server Responsibilities in Effecting Session Transfer | 11.2. Server Responsibilities in Effecting Transparent State | |||
(to be added) . . . . . . . . . . . . . . . . . . . . . 42 | Migration (to be added) . . . . . . . . . . . . . . . . 42 | |||
12. Changes to RFC5661 outside Section 11 . . . . . . . . . . . . 44 | 11.3. Server Responsibilities in Effecting Session Transfer | |||
12.1. (Introduction to) Multi-Server Namespace (as updated) . 45 | (to be added) . . . . . . . . . . . . . . . . . . . . . 44 | |||
12.2. Server Scope (as updated) . . . . . . . . . . . . . . . 46 | 12. fs_locations_info . . . . . . . . . . . . . . . . . . . . . . 46 | |||
12.3. Revised Treatment of NFS4ERR_MOVED . . . . . . . . . . . 47 | 12.1. Updates to treatment of fs_locations_info . . . . . . . 47 | |||
12.4. Revised Discussion of Server_owner changes . . . . . . . 48 | 12.2. The Attribute fs_locations_info (as updated) . . . . . . 47 | |||
12.5. Revision to Treatment of EXCHANGE_ID . . . . . . . . . . 49 | 12.2.1. The fs_locations_server4 Structure (as updated) . . 51 | |||
13. Operation 42: EXCHANGE_ID - Instantiate Client ID (as | 12.2.2. The fs_locations_info4 Structure (as updated) . . . 57 | |||
updated) . . . . . . . . . . . . . . . . . . . . . . . . . . 50 | 12.2.3. The fs_locations_item4 Structure (as updated) . . . 59 | |||
14. Security Considerations . . . . . . . . . . . . . . . . . . . 68 | 13. Changes to RFC5661 outside Section 11 . . . . . . . . . . . . 61 | |||
15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 70 | 13.1. (Introduction to) Multi-Server Namespace (as updated) . 62 | |||
16. References . . . . . . . . . . . . . . . . . . . . . . . . . 71 | 13.2. Server Scope (as updated) . . . . . . . . . . . . . . . 62 | |||
16.1. Normative References . . . . . . . . . . . . . . . . . . 71 | 13.3. Revised Treatment of NFS4ERR_MOVED . . . . . . . . . . . 64 | |||
16.2. Informative References . . . . . . . . . . . . . . . . . 72 | 13.4. Revised Discussion of Server_owner changes . . . . . . . 65 | |||
Appendix A. Classification of Document Sections . . . . . . . . 72 | 13.5. Revision to Treatment of EXCHANGE_ID . . . . . . . . . . 65 | |||
Appendix B. Updates to RFC5661 . . . . . . . . . . . . . . . . . 74 | 13.6. Revision to Treatment of RECLAIM_COMPLETE . . . . . . . 67 | |||
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 76 | 13.7. Reclaim Errors (as updated) . . . . . . . . . . . . . . 67 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 77 | 13.7.1. NFS4ERR_COMPLETE_ALREADY (as updated; Error Code | |||
10054) . . . . . . . . . . . . . . . . . . . . . . . 67 | ||||
13.7.2. NFS4ERR_GRACE (as updated; Error Code 10013) . . . . 67 | ||||
13.7.3. NFS4ERR_NO_GRACE (as updated; Error Code 10033) . . 67 | ||||
13.7.4. NFS4ERR_RECLAIM_BAD (as updated; Error Code 10034) . 68 | ||||
13.7.5. NFS4ERR_RECLAIM_CONFLICT (as updated; Error Code | ||||
10035) . . . . . . . . . . . . . . . . . . . . . . . 68 | ||||
14. Operation 42: EXCHANGE_ID - Instantiate Client ID (as | ||||
updated) . . . . . . . . . . . . . . . . . . . . . . . . . . 68 | ||||
15. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished | ||||
(as updated) . . . . . . . . . . . . . . . . . . . . . . . . 86 | ||||
16. Security Considerations . . . . . . . . . . . . . . . . . . . 90 | ||||
17. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 92 | ||||
18. References . . . . . . . . . . . . . . . . . . . . . . . . . 92 | ||||
18.1. Normative References . . . . . . . . . . . . . . . . . . 92 | ||||
18.2. Informative References . . . . . . . . . . . . . . . . . 93 | ||||
Appendix A. Classification of Document Sections . . . . . . . . 94 | ||||
Appendix B. Updates to RFC5661 . . . . . . . . . . . . . . . . . 95 | ||||
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 98 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 99 | ||||
1. Introduction | 1. Introduction | |||
This document defines the proper handling, within NFSv4.1, of the | This document defines the proper handling, within NFSv4.1, of the | |||
location-related attributes fs_locations and fs_locations_info and | location-related attributes fs_locations and fs_locations_info and | |||
how necessary changes in those attributes are to be dealt with. The | how necessary changes in those attributes are to be dealt with. The | |||
necessary corrections and clarifications parallel those done for | necessary corrections and clarifications parallel those done for | |||
NFSv4.0 in [RFC7931] and [I-D.cel-nfsv4-mv0-trunking-update]. | NFSv4.0 in [RFC7931] and [I-D.cel-nfsv4-mv0-trunking-update]. | |||
A large part of the changes to be made are necessary to clarify the | A large part of the changes to be made are necessary to clarify the | |||
skipping to change at page 4, line 10 ¶ | skipping to change at page 4, line 29 ¶ | |||
Another important issue to be dealt with concerns the handling of | Another important issue to be dealt with concerns the handling of | |||
multiple entries within location-related attributes that represent | multiple entries within location-related attributes that represent | |||
different ways to access the same file system. Unfortunately | different ways to access the same file system. Unfortunately | |||
[RFC5661], while recognizing that these entries can represent | [RFC5661], while recognizing that these entries can represent | |||
different ways to access the same file system, confuses the matter by | different ways to access the same file system, confuses the matter by | |||
treating network access paths as "replicas", making it difficult for | treating network access paths as "replicas", making it difficult for | |||
these attributes to be used to obtain information about the network | these attributes to be used to obtain information about the network | |||
addresses to be used to access particular file system instances and | addresses to be used to access particular file system instances and | |||
engendering confusion between two different sorts of transition: | engendering confusion between two different sorts of transition: | |||
those involving a change of network access paths to the same file | those involving a change of network access paths to the same file | |||
system instance and those in which there is shift between two | system instance and those in which there is a shift between two | |||
distinct replicas. | distinct replicas. | |||
When location information is used to determine the set of network | When location information is used to determine the set of network | |||
addresses to access a particular file system instance (i.e. to | addresses to access a particular file system instance (i.e. to | |||
perform trunking discovery), clarification is needed regarding the | perform trunking discovery), clarification is needed regarding the | |||
interaction of trunking and transitions between file system replicas, | interaction of trunking and transitions between file system replicas, | |||
including migration. Unfortunately [RFC5661], while it provided a | including migration. Unfortunately [RFC5661], while it provided a | |||
method of determining whether two network addresses were connected to | method of determining whether two network addresses were connected to | |||
the same server, did not address the issue of trunking discovery, | the same server, did not address the issue of trunking discovery, | |||
making it necessary to address it in this document. | making it necessary to address it in this document. | |||
skipping to change at page 5, line 15 ¶ | skipping to change at page 5, line 37 ¶ | |||
version, and, in some cases, on the client implementation. | version, and, in some cases, on the client implementation. | |||
In the case of NFS version 4.1 and later minor versions, the means | In the case of NFS version 4.1 and later minor versions, the means | |||
of trunking detection are as described by [RFC5661] and are | of trunking detection are as described by [RFC5661] and are | |||
available to every client. Two network addresses connected to the | available to every client. Two network addresses connected to the | |||
same server are always server-trunkable but are not necessarily | same server are always server-trunkable but are not necessarily | |||
session-trunkable. | session-trunkable. | |||
o Trunking discovery is a process by which a client using one | o Trunking discovery is a process by which a client using one | |||
network address can obtain other addresses that are connected to | network address can obtain other addresses that are connected to | |||
the same server Typically it builds on a trunking detection | the same server. Typically it builds on a trunking detection | |||
facility by providing one or more methods by which candidate | facility by providing one or more methods by which candidate | |||
addresses are made available to the client who can then use | addresses are made available to the client who can then use | |||
trunking detection to appropriately filter them. | trunking detection to appropriately filter them. | |||
Despite the support for trunking detection there was no | Despite the support for trunking detection there was no | |||
description of trunking discovery provided in [RFC5661]. | description of trunking discovery provided in [RFC5661]. | |||
Regarding network addresses and the handling of trunking we use the | Regarding network addresses and the handling of trunking we use the | |||
following terminology: | following terminology: | |||
o Each NFSv4 server is assumed to have a set of IP addresses to | o Each NFSv4 server is assumed to have a set of IP addresses to | |||
which NFSv4 requests may be sent by clients. These are referred | which NFSv4 requests may be sent by clients. These are referred | |||
to as the server's network addresses. Access to a specfic server | to as the server's network addresses. Access to a specific server | |||
network address may involve the use of multiple ports, since the | network address may involve the use of multiple ports, since the | |||
ports to be used for various types of connections might be | ports to be used for various types of connections might be | |||
required to be different. | required to be different. | |||
o Each network address, when combined with a pathname providing the | o Each network address, when combined with a pathname providing the | |||
location of a file system root directory relative to the | location of a file system root directory relative to the | |||
associated server root file handle, defines a file system network | associated server root file handle, defines a file system network | |||
access path. | access path. | |||
o Server network addresses are used to establish connections to | o Server network addresses are used to establish connections to | |||
skipping to change at page 7, line 34 ¶ | skipping to change at page 8, line 8 ¶ | |||
instance (i.e. trunking) was often treated as if two replicas were | instance (i.e. trunking) was often treated as if two replicas were | |||
involved, it was considered that two replicas were being used | involved, it was considered that two replicas were being used | |||
simultaneously. As a result, the treatment of replicas being used | simultaneously. As a result, the treatment of replicas being used | |||
simultaneously in [RFC5661] was not clear as it covered the two | simultaneously in [RFC5661] was not clear as it covered the two | |||
distinct cases of a single file system instance being accessed by | distinct cases of a single file system instance being accessed by | |||
two different network access paths and two replicas being accessed | two different network access paths and two replicas being accessed | |||
simultaneously, with the limitations of the latter case not being | simultaneously, with the limitations of the latter case not being | |||
clearly laid out. | clearly laid out. | |||
The majority of the consequences of these issues are dealt with via | The majority of the consequences of these issues are dealt with via | |||
the updates in various subsections of Section 4 of the current | the updates in various subsections of Section 4 and the whole of | |||
document which deal with problems within Section 11 of [RFC5661]. | Section 12 within the current document which deal with problems | |||
These include: | within Section 11 of [RFC5661] These changes include: | |||
o Reorganization made necessary by the fact that two network access | o Reorganization made necessary by the fact that two network access | |||
paths to the same file system instance needs to be distinguished | paths to the same file system instance needs to be distinguished | |||
clearly from two different replicas since the former share locking | clearly from two different replicas since the former share locking | |||
state and can share session state. | state and can share session state. | |||
o The need for a clear statement regarding the desirability of | o The need for a clear statement regarding the desirability of | |||
transparent transfer of state together with a recommendation that | transparent transfer of state together with a recommendation that | |||
either that or a single-fs grace period be provided. | either that or a single-fs grace period be provided. | |||
o Specifically delineating how such transfers are to be dealt with | o Specifically delineating how such transfers are to be dealt with | |||
by the client, taking into account the differences from the | by the client, taking into account the differences from the | |||
treatment in [RFC7931] made necessary by the major protocol | treatment in [RFC7931] made necessary by the major protocol | |||
changes made in NFSv4.1. | changes made in NFSv4.1. | |||
o Discussion of the relationship between transparent state transfer | o Discussion of the relationship between transparent state transfer | |||
and Parallel NFS (pNFS). | and Parallel NFS (pNFS). | |||
o A clarification of the fs_locations_info attribute to specify | ||||
which portions of the information provided apply to a specific | ||||
network access path and which to the replica which that path is | ||||
used to access. | ||||
In addition, there are also updates to other sections of [RFC5661], | In addition, there are also updates to other sections of [RFC5661], | |||
where the consequences of the incorrect assumptions underlying the | where the consequences of the incorrect assumptions underlying the | |||
current treatment of multi-server namespace issues also need to be | current treatment of multi-server namespace issues also need to be | |||
corrected. These are to be dealt with as described in various | corrected. These are to be dealt with as described in Sections 13 | |||
subsections of Section 12 of the current document. | through 15 of the current document. | |||
o A revised introductory section regarding multi-server namespace | o A revised introductory section regarding multi-server namespace | |||
facilities is provided. | facilities is provided. | |||
o A more realistic treatment of server scope is provided, which | o A more realistic treatment of server scope is provided, which | |||
reflects the more limited co-ordination of locking state adopted | reflects the more limited co-ordination of locking state adopted | |||
by servers actually sharing a common server scope. | by servers actually sharing a common server scope. | |||
o Some confusing text regarding changes in server_owner needs to be | o Some confusing text regarding changes in server_owner needs to be | |||
clarified. | clarified. | |||
o The description of NFS4ERR_MOVED needs to be updated since two | o The description of NFS4ERR_MOVED needs to be updated since two | |||
different network access paths to the same file system are no | different network access paths to the same file system are no | |||
longer considered to be two instances of the same file system. | longer considered to be two instances of the same file system. | |||
o A new treatment of EXCHANGE_ID is needed, replacing that which | o A new treatment of EXCHANGE_ID is needed, replacing that which | |||
appeared in Section 18.35 of [RFC5661] | appeared in Section 18.35 of [RFC5661]. This is necessary since | |||
the existing treatment of client id confirmation does not make | ||||
sense in the context of transparent state migration, in which | ||||
client ids are transferred between source and destination servers. | ||||
o A new treatment of RECLAIM_COMPLETE is needed, replacing that | ||||
which appeared in Section 18.51 of [RFC5661]. This is necessary | ||||
to clarify the function of the one-fs flag and clarify how | ||||
existing clients, that might not properly use this flag, are to be | ||||
dealt with. | ||||
3.3. Relationship of this Document to RFC5661 | 3.3. Relationship of this Document to RFC5661 | |||
The role of this document is to explain and specify a set of needed | The role of this document is to explain and specify a set of needed | |||
changes to [RFC5661]. All of these changes are related to the multi- | changes to [RFC5661]. All of these changes are related to the multi- | |||
server namespace features of NFSv4.1. | server namespace features of NFSv4.1. | |||
This document contains sections that propose additions to and other | This document contains sections that propose additions to and other | |||
modifications of [RFC5661] as well as others that explain the reasons | modifications of [RFC5661] as well as others that explain the reasons | |||
for modifications but do not directly affect existing specifications. | for modifications but do not directly affect existing specifications. | |||
skipping to change at page 9, line 25 ¶ | skipping to change at page 10, line 14 ¶ | |||
o Editing sections contain some text that replaces text within | o Editing sections contain some text that replaces text within | |||
[RFC5661], although the entire section will not consist of such | [RFC5661], although the entire section will not consist of such | |||
text and will include other text as well. Such sections make | text and will include other text as well. Such sections make | |||
relatively minor adjustments in the existing NFSv4.1 specification | relatively minor adjustments in the existing NFSv4.1 specification | |||
which are expected to reflected in an eventual consolidated | which are expected to reflected in an eventual consolidated | |||
document. Generally such replacement text appears as a quotation, | document. Generally such replacement text appears as a quotation, | |||
which may take the form of an indented set of paragraphs. | which may take the form of an indented set of paragraphs. | |||
See Appendix A for a classification of the sections of this document | See Appendix A for a classification of the sections of this document | |||
according the categories above. | according to the categories above. | |||
When this document is approved and published, [RFC5661] would be | When this document is approved and published, [RFC5661] would be | |||
significantly updated with most of the changed sections within the | significantly updated with most of the changed sections within the | |||
current Section 11 of that document. A detailed discussion of the | current Section 11 of that document. A detailed discussion of the | |||
necessary updates can be found in Appendix B. | necessary updates can be found in Appendix B. | |||
4. Changes to Section 11 of RFC5661 | 4. Changes to Section 11 of RFC5661 | |||
A number of sections need to be revised, replacing existing sub- | A number of sections need to be revised, replacing existing sub- | |||
sections within section 11 of [RFC5661]: | sections within section 11 of [RFC5661]: | |||
skipping to change at page 10, line 11 ¶ | skipping to change at page 10, line 47 ¶ | |||
New material relating to the handling of the location attributes | New material relating to the handling of the location attributes | |||
is contained in Sections 4.5.1 and 4.5.7 below. | is contained in Sections 4.5.1 and 4.5.7 below. | |||
o A major replacement for the existing Section 11.7 of [RFC5661] | o A major replacement for the existing Section 11.7 of [RFC5661] | |||
entitled "Effecting File System Transitions", will appear as | entitled "Effecting File System Transitions", will appear as | |||
Sections 6 through 11 of the current document. The reasons for | Sections 6 through 11 of the current document. The reasons for | |||
the reorganization of this section into multiple sections are | the reorganization of this section into multiple sections are | |||
discussed below in Section 5 of the current document. | discussed below in Section 5 of the current document. | |||
o A replacement for the existing Section 11.10 of [RFC5661] entitled | ||||
"The Attribute fs_locations_info", will appear as Section 12.2 of | ||||
the current document, with Section 12.1 describing the differences | ||||
between the new section and the treatment within [RFC5661]. A | ||||
revised treatment is necessary because the existing treatment did | ||||
not make clear how the added attribute information relates to the | ||||
case of trunked paths to the same replica. These issues were not | ||||
addressed in [RFC5661] where the concepts of a replica and a | ||||
network path used to access a replica were not clearly | ||||
distinguished. | ||||
4.1. Multi-Server Namespace (as updated) | 4.1. Multi-Server Namespace (as updated) | |||
NFSv4.1 supports attributes that allow a namespace to extend beyond | NFSv4.1 supports attributes that allow a namespace to extend beyond | |||
the boundaries of a single server. It is desirable that clients and | the boundaries of a single server. It is desirable that clients and | |||
servers support construction of such multi-server namespaces. Use of | servers support construction of such multi-server namespaces. Use of | |||
such multi-server namespaces is OPTIONAL however, and for many | such multi-server namespaces is OPTIONAL however, and for many | |||
purposes, single-server namespaces are perfectly acceptable. Use of | purposes, single-server namespaces are perfectly acceptable. Use of | |||
multi-server namespaces can provide many advantages, by separating a | multi-server namespaces can provide many advantages, by separating a | |||
file system's logical position in a namespace from the (possibly | file system's logical position in a namespace from the (possibly | |||
changing) logistical and administrative considerations that result in | changing) logistical and administrative considerations that result in | |||
skipping to change at page 10, line 39 ¶ | skipping to change at page 11, line 37 ¶ | |||
by NFSv4 clients. Typically, this is done by assigning each file | by NFSv4 clients. Typically, this is done by assigning each file | |||
system a name within the pseudo-fs associated with the server, | system a name within the pseudo-fs associated with the server, | |||
although the pseudo-fs may be dispensed with if there is only a | although the pseudo-fs may be dispensed with if there is only a | |||
single exported file system. Each such file system is part of the | single exported file system. Each such file system is part of the | |||
server's local namespace, and can be considered as a file system | server's local namespace, and can be considered as a file system | |||
instance within a larger multi-server namespace. | instance within a larger multi-server namespace. | |||
o The set of all exported file systems for a given server | o The set of all exported file systems for a given server | |||
constitutes that server's local namespace. | constitutes that server's local namespace. | |||
o In some cases, a server will have a namespace, more extensive than | o In some cases, a server will have a namespace more extensive than | |||
its local namespace, by using features associated with attributes | its local namespace, by using features associated with attributes | |||
that provide location information. These features, which allow | that provide location information. These features, which allow | |||
construction of a multi-server namespace are all described in | construction of a multi-server namespace are all described in | |||
individual sections below and include referrals (described in | individual sections below and include referrals (described in | |||
Section 4.5.6), migration (described in Section 4.5.5), and | Section 4.5.6), migration (described in Section 4.5.5), and | |||
replication (described in Section 4.5.4). | replication (described in Section 4.5.4). | |||
o A file system present in a server's pseudo-fs may have multiple | o A file system present in a server's pseudo-fs may have multiple | |||
file system instances on different servers associated with it. | file system instances on different servers associated with it. | |||
All such instances are considered replicas of one another. | All such instances are considered replicas of one another. | |||
skipping to change at page 11, line 25 ¶ | skipping to change at page 12, line 25 ¶ | |||
location attributes. Each such entry specifies a server, in the | location attributes. Each such entry specifies a server, in the | |||
form of a host name or IP address, and an fs name, which | form of a host name or IP address, and an fs name, which | |||
designates the location of the file system within the server's | designates the location of the file system within the server's | |||
pseudo-fs. A location entry designates a set of server endpoints | pseudo-fs. A location entry designates a set of server endpoints | |||
to which the client may establish connections. There may be | to which the client may establish connections. There may be | |||
multiple endpoints because a host name may map to multiple network | multiple endpoints because a host name may map to multiple network | |||
addresses and because multiple connection types may be used to | addresses and because multiple connection types may be used to | |||
communicate with a single network address. However, all such | communicate with a single network address. However, all such | |||
endpoints MUST provide a way of connecting to a single server. | endpoints MUST provide a way of connecting to a single server. | |||
The exact form of the location entry varies with the particular | The exact form of the location entry varies with the particular | |||
location attribute used as described in Section 4.3. | location attribute used, as described in Section 4.3. | |||
o Location elements are derived from location entries and each | o Location elements are derived from location entries and each | |||
describes a particular network access path, consisting of a | describes a particular network access path, consisting of a | |||
network address and a location within the server's pseudo-fs. | network address and a location within the server's pseudo-fs. | |||
Location elements need not appear within a location attribute, but | Location elements need not appear within a location attribute, but | |||
the existence of each location element derives from a | the existence of each location element derives from a | |||
corresponding location entry. When a location entry specifies an | corresponding location entry. When a location entry specifies an | |||
IP address there is only a single corresponding location element. | IP address there is only a single corresponding location element. | |||
Location entries that contain a host name, are resolved using DNS, | Location entries that contain a host name, are resolved using DNS, | |||
and may result in one or more location elements. All location | and may result in one or more location elements. All location | |||
skipping to change at page 12, line 26 ¶ | skipping to change at page 13, line 26 ¶ | |||
namespace of one server can be associated with one or more instances | namespace of one server can be associated with one or more instances | |||
of that file system on other servers. These attributes contain | of that file system on other servers. These attributes contain | |||
location entries specifying a server address target (either as a DNS | location entries specifying a server address target (either as a DNS | |||
name representing one or more IP addresses or as a specific IP | name representing one or more IP addresses or as a specific IP | |||
address) together with the pathname of that file system within the | address) together with the pathname of that file system within the | |||
associated single-server namespace. | associated single-server namespace. | |||
The fs_locations_info RECOMMENDED attribute allows specification of | The fs_locations_info RECOMMENDED attribute allows specification of | |||
one or more file system instance locations where the data | one or more file system instance locations where the data | |||
corresponding to a given file system may be found. This attribute | corresponding to a given file system may be found. This attribute | |||
provides to the client, in to addition to specification of file | provides to the client, in addition to specification of file system | |||
system instance locations, other helpful information such as: | instance locations, other helpful information such as: | |||
o Information guiding choices among the various file system | o Information guiding choices among the various file system | |||
instances provided (e.g., priority for use, writability, currency, | instances provided (e.g., priority for use, writability, currency, | |||
etc.). | etc.). | |||
o Information to help the client efficiently effect as seamless a | o Information to help the client efficiently effect as seamless a | |||
transition as possible among multiple file system instances, when | transition as possible among multiple file system instances, when | |||
and if that should be necessary. | and if that should be necessary. | |||
o Information helping to guide the selection of the appropriate | o Information helping to guide the selection of the appropriate | |||
skipping to change at page 12, line 51 ¶ | skipping to change at page 13, line 51 ¶ | |||
entry corresponds to a location entry with the fls_server field | entry corresponds to a location entry with the fls_server field | |||
designating the server, with the location pathname within the | designating the server, with the location pathname within the | |||
server's pseudo-fs given by the fl_rootpath field of the encompassing | server's pseudo-fs given by the fl_rootpath field of the encompassing | |||
fs_locations_item4. | fs_locations_item4. | |||
The fs_locations attribute defined in NFSv4.0 is also a part of | The fs_locations attribute defined in NFSv4.0 is also a part of | |||
NFSv4.1. This attribute only allows specification of the file system | NFSv4.1. This attribute only allows specification of the file system | |||
locations where the data corresponding to a given file system may be | locations where the data corresponding to a given file system may be | |||
found. Servers should make this attribute available whenever | found. Servers should make this attribute available whenever | |||
fs_locations_info is supported, but client use of fs_locations_info | fs_locations_info is supported, but client use of fs_locations_info | |||
is preferable. | is preferable, as it provides more information. | |||
Within the fs_location attribute, each fs_location4 contains a | Within the fs_location attribute, each fs_location4 contains a | |||
location entry with the server field designating the server and the | location entry with the server field designating the server and the | |||
rootpath field giving the location pathname within the server's | rootpath field giving the location pathname within the server's | |||
pseudo-fs. | pseudo-fs. | |||
4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 | 4.4. Re-organization of Sections 11.4 and 11.5 of RFC5661 | |||
Previously, issues related to the fact that multiple location entries | Previously, issues related to the fact that multiple location entries | |||
directed the client to the same file system instance were dealt with | directed the client to the same file system instance were dealt with | |||
skipping to change at page 14, line 19 ¶ | skipping to change at page 15, line 19 ¶ | |||
server can be associated with a namespace defined by another server, | server can be associated with a namespace defined by another server, | |||
thus allowing a general multi-server namespace facility. A | thus allowing a general multi-server namespace facility. A | |||
designation of such a remote instance, in place of a file system | designation of such a remote instance, in place of a file system | |||
never previously present , is called a "pure referral" and is | never previously present , is called a "pure referral" and is | |||
discussed in Section 4.5.6 below. | discussed in Section 4.5.6 below. | |||
Because client support for location-related attributes is OPTIONAL, a | Because client support for location-related attributes is OPTIONAL, a | |||
server may (but is not required to) take action to hide migration and | server may (but is not required to) take action to hide migration and | |||
referral events from such clients, by acting as a proxy, for example. | referral events from such clients, by acting as a proxy, for example. | |||
The server can determine the presence of client support from the | The server can determine the presence of client support from the | |||
arguments of the EXCHANGE_ID operation (see Section 13.3 in the | arguments of the EXCHANGE_ID operation (see Section 14.3 in the | |||
current document). | current document). | |||
4.5.1. Combining Multiple Uses in a Single Attribute (to be added) | 4.5.1. Combining Multiple Uses in a Single Attribute (to be added) | |||
A location attribute will sometimes contain information relating to | A location attribute will sometimes contain information relating to | |||
the location of multiple replicas which may be used in different | the location of multiple replicas which may be used in different | |||
ways. | ways. | |||
o Location entries that relate to the file system instance currently | o Location entries that relate to the file system instance currently | |||
in use provide trunking information, allowing the client to find | in use provide trunking information, allowing the client to find | |||
skipping to change at page 15, line 38 ¶ | skipping to change at page 16, line 38 ¶ | |||
network addresses. It might use the latter form because of DNS- | network addresses. It might use the latter form because of DNS- | |||
related security concerns or because the set of addresses to be used | related security concerns or because the set of addresses to be used | |||
might require active management by the server. | might require active management by the server. | |||
Locations entries used to discover candidate addresses for use in | Locations entries used to discover candidate addresses for use in | |||
trunking are subject to change, as discussed in Section 4.5.7 below. | trunking are subject to change, as discussed in Section 4.5.7 below. | |||
The client may respond to such changes by using additional addresses | The client may respond to such changes by using additional addresses | |||
once they are verified or by ceasing to use existing ones. The | once they are verified or by ceasing to use existing ones. The | |||
server can force the client to cease using an address by returning | server can force the client to cease using an address by returning | |||
NFS4ERR_MOVED when that address is used to access a file system. | NFS4ERR_MOVED when that address is used to access a file system. | |||
This allows a transfer of access similar to migration, although the | This allows a transfer of client access which is similar to | |||
same file system instance is accessed throughout. | migration, although the same file system instance is accessed | |||
throughout. | ||||
4.5.3. Location Attributes and Connection Type Selection (to be added) | 4.5.3. Location Attributes and Connection Type Selection (to be added) | |||
Because of the need to support multiple connections, clients face the | Because of the need to support multiple connections, clients face the | |||
issue of determining the proper connection type to use when | issue of determining the proper connection type to use when | |||
establishing a connection to a given server network address. In some | establishing a connection to a given server network address. In some | |||
cases, this issue can be addressed through the use of the connection | cases, this issue can be addressed through the use of the connection | |||
"step-up" facility described in Section 18.16 of [RFC5661]. However, | "step-up" facility described in Section 18.16 of [RFC5661]. However, | |||
because there are cases is which that fcility is not available, the | because there are cases is which that facility is not available, the | |||
client may have to choose a connection type with no possibility of | client may have to choose a connection type with no possibility of | |||
changing it within the scope of a single connection. | changing it within the scope of a single connection. | |||
The two location attributes differ as to the information made | The two location attributes differ as to the information made | |||
available in this regard. Fs_locations provides no information to | available in this regard. Fs_locations provides no information to | |||
support connection type selection. As a result, clients supporting | support connection type selection. As a result, clients supporting | |||
multiple connection types need to attempt to establish a connection | multiple connection types would need to attempt to establish | |||
on multiple connection types until the one preferred by the client is | connections using multiple connection types until the one preferred | |||
successfully established. | by the client is successfully established. | |||
Fs_locations_info provides a flag, FSLI4TF_RDMA flag. indicating | Fs_locations_info provides a flag, FSLI4TF_RDMA flag. indicating | |||
that RPC-over-RDMA support is available using the specfied location | that RPC-over-RDMA support is available using the specified location | |||
entry. This flag makes it for a convenient for a client wishing to | entry. This flag makes it for a convenient for a client wishing to | |||
use RDMA, to establish a TCP connection and then convert to use of | use RDMA, to establish a TCP connection and then convert to use of | |||
RDMA. After establishing a TCP connection, the step-up facility, can | RDMA. After establishing a TCP connection, the step-up facility, can | |||
be used, if available, to convert that connection to RDMA mode. | be used, if available, to convert that connection to RDMA mode. | |||
Otherwise, if RDMA availability is indicated, a new RDMA connection | Otherwise, if RDMA availability is indicated, a new RDMA connection | |||
can be established and it can be bound to the sessiion already | can be established and it can be bound to the session already | |||
established by the TCP connection, allowing the TCP connection to be | established by the TCP connection, allowing the TCP connection to be | |||
dropped and the session converted to further use in RDMA node. | dropped and the session converted to further use in RDMA node. | |||
4.5.4. File System Replication (as updated) | 4.5.4. File System Replication (as updated) | |||
The fs_locations and fs_locations_info attributes provide alternative | The fs_locations and fs_locations_info attributes provide alternative | |||
locations, to be used to access data in place of or in addition to | locations, to be used to access data in place of or in addition to | |||
the current file system instance. On first access to a file system, | the current file system instance. On first access to a file system, | |||
the client should obtain the set of alternate locations by | the client should obtain the set of alternate locations by | |||
interrogating the fs_locations or fs_locations_info attribute, with | interrogating the fs_locations or fs_locations_info attribute, with | |||
skipping to change at page 17, line 4 ¶ | skipping to change at page 18, line 4 ¶ | |||
fs_locations and fs_locations_info attributes and how the client | fs_locations and fs_locations_info attributes and how the client | |||
deals with file system transition issues will be discussed in detail | deals with file system transition issues will be discussed in detail | |||
below. | below. | |||
4.5.5. File System Migration (as updated) | 4.5.5. File System Migration (as updated) | |||
When a file system is present and becomes absent, clients can be | When a file system is present and becomes absent, clients can be | |||
given the opportunity to have continued access to their data, at an | given the opportunity to have continued access to their data, at an | |||
alternate location, as specified by a location attribute. This | alternate location, as specified by a location attribute. This | |||
migration of access to another replica includes the ability to retain | migration of access to another replica includes the ability to retain | |||
locks across the transition, either by reclaim or by Transparent | locks across the transition, either by using lock reclaim or by | |||
State Migration. | taking advantage of Transparent State Migration. | |||
Typically, a client will be accessing the file system in question, | Typically, a client will be accessing the file system in question, | |||
get an NFS4ERR_MOVED error, and then use a location attribute to | get an NFS4ERR_MOVED error, and then use a location attribute to | |||
determine the new location of the data. When fs_locations_info is | determine the new location of the data. When fs_locations_info is | |||
used, additional information will be available that will define the | used, additional information will be available that will define the | |||
nature of the client's handling of the transition to a new server. | nature of the client's handling of the transition to a new server. | |||
Such migration can be helpful in providing load balancing or general | Such migration can be helpful in providing load balancing or general | |||
resource reallocation. The protocol does not specify how the file | resource reallocation. The protocol does not specify how the file | |||
system will be moved between servers. It is anticipated that a | system will be moved between servers. It is anticipated that a | |||
skipping to change at page 17, line 30 ¶ | skipping to change at page 18, line 30 ¶ | |||
The new location may be, in the case of various forms of server | The new location may be, in the case of various forms of server | |||
clustering, another server providing access to the same physical file | clustering, another server providing access to the same physical file | |||
system. The client's responsibilities in dealing with this | system. The client's responsibilities in dealing with this | |||
transition will depend on whether migration has occurred and the | transition will depend on whether migration has occurred and the | |||
means the server has chosen to provide continuity of locking state. | means the server has chosen to provide continuity of locking state. | |||
These issues will be discussed in detail below. | These issues will be discussed in detail below. | |||
Although a single successor location is typical, multiple locations | Although a single successor location is typical, multiple locations | |||
may be provided. When multiple locations are provided, the client | may be provided. When multiple locations are provided, the client | |||
use the first one provided. If that is inaccessible for some reason, | will typically use the first one provided. If that is inaccessible | |||
later ones can be used. In such cases the client might consider that | for some reason, later ones can be used. In such cases the client | |||
the transition to the new replica is a migration event, although it | might consider that the transition to the new replica as a migration | |||
would lose access to locking state if it did so. | event, even though some of the servers involved might not be aware of | |||
the use of the server which was inaccessible. In such a case, a | ||||
client might lose access to locking state as a result of the access | ||||
transfer. | ||||
When an alternate location is designated as the target for migration, | When an alternate location is designated as the target for migration, | |||
it must designate the same data (with metadata being the same to the | it must designate the same data (with metadata being the same to the | |||
degree indicated by the fs_locations_info attribute). Where file | degree indicated by the fs_locations_info attribute). Where file | |||
systems are writable, a change made on the original file system must | systems are writable, a change made on the original file system must | |||
be visible on all migration targets. Where a file system is not | be visible on all migration targets. Where a file system is not | |||
writable but represents a read-only copy (possibly periodically | writable but represents a read-only copy (possibly periodically | |||
updated) of a writable file system, similar requirements apply to the | updated) of a writable file system, similar requirements apply to the | |||
propagation of updates. Any change visible in the original file | propagation of updates. Any change visible in the original file | |||
system must already be effected on all migration targets, to avoid | system must already be effected on all migration targets, to avoid | |||
any possibility that a client, in effecting a transition to the | any possibility that a client, in effecting a transition to the | |||
migration target, will see any reversion in file system state. | migration target, will see any reversion in file system state. | |||
4.5.6. Referrals (as updated) | 4.5.6. Referrals (as updated) | |||
Referrals allow the server to associate a file system located on one | Referrals allow the server to associate a file system namespace entry | |||
server with file system located on another server. When this | located on one server with a file system located on another server. | |||
includes the use of pure referrals, servers are provided a way of | When this includes the use of pure referrals, servers are provided a | |||
placing a file system in a location within the namespace essentially | way of placing a file system in a location within the namespace | |||
without respect to its physical location on a particular server. | essentially without respect to its physical location on a particular | |||
This allows a single server or a set of servers to present a multi- | server. This allows a single server or a set of servers to present a | |||
server namespace that encompasses file systems located on a wider | multi-server namespace that encompasses file systems located on a | |||
range of servers. Some likely uses of this facility include | wider range of servers. Some likely uses of this facility include | |||
establishment of site-wide or organization-wide namespaces, with the | establishment of site-wide or organization-wide namespaces, with the | |||
eventual possibility of combining such together into a truly global | eventual possibility of combining such together into a truly global | |||
namespace. | namespace. | |||
Referrals occur when a client determines, upon first referencing a | Referrals occur when a client determines, upon first referencing a | |||
position in the current namespace, that it is part of a new file | position in the current namespace, that it is part of a new file | |||
system and that the file system is absent. When this occurs, | system and that the file system is absent. When this occurs, | |||
typically by receiving the error NFS4ERR_MOVED, the actual location | typically upon receiving the error NFS4ERR_MOVED, the actual location | |||
or locations of the file system can be determined by fetching the | or locations of the file system can be determined by fetching the a | |||
fs_locations or fs_locations_info attribute. | locations attribute. attribute. | |||
The locations-related attribute may designate a single file system | The locations attribute may designate a single file system location | |||
location or multiple file system locations, to be selected based on | or multiple file system locations, to be selected based on the needs | |||
the needs of the client. The server, in the fs_locations_info | of the client. The server, in the fs_locations_info attribute, may | |||
attribute, may specify priorities to be associated with various file | specify priorities to be associated with various file system location | |||
system location choices. The server may assign different priorities | choices. The server may assign different priorities to different | |||
to different locations as reported to individual clients, in order to | locations as reported to individual clients, in order to adapt to | |||
adapt to client physical location or to effect load balancing. When | client physical location or to effect load balancing. When both | |||
both read-only and read-write file systems are present, some of the | read-only and read-write file systems are present, some of the read- | |||
read-only locations might not be absolutely up-to-date (as they would | only locations might not be absolutely up-to-date (as they would have | |||
have to be in the case of replication and migration). Servers may | to be in the case of replication and migration). Servers may also | |||
also specify file system locations that include client-substituted | specify file system locations that include client-substituted | |||
variables so that different clients are referred to different file | variables so that different clients are referred to different file | |||
systems (with different data contents) based on client attributes | systems (with different data contents) based on client attributes | |||
such as CPU architecture. | such as CPU architecture. | |||
When the fs_locations_info attribute is such that that there are | When the fs_locations_info attribute is such that that there are | |||
multiple possible targets listed, the relationships among them may be | multiple possible targets listed, the relationships among them may be | |||
important to the client in selecting which one to use. The same | important to the client in selecting which one to use. The same | |||
rules specified in Section 4.5.5 below regarding multiple migration | rules specified in Section 4.5.5 below regarding multiple migration | |||
targets apply to these multiple replicas as well. For example, the | targets apply to these multiple replicas as well. For example, the | |||
client might prefer a writable target on a server that has additional | client might prefer a writable target on a server that has additional | |||
skipping to change at page 19, line 16 ¶ | skipping to change at page 20, line 21 ¶ | |||
providing a large set of pure referrals to all of the included file | providing a large set of pure referrals to all of the included file | |||
systems. Alternatively, a single multi-server namespace may be | systems. Alternatively, a single multi-server namespace may be | |||
administratively segmented with separate referral file systems (on | administratively segmented with separate referral file systems (on | |||
separate servers) for each separately administered portion of the | separate servers) for each separately administered portion of the | |||
namespace. The top-level referral file system or any segment may use | namespace. The top-level referral file system or any segment may use | |||
replicated referral file systems for higher availability. | replicated referral file systems for higher availability. | |||
Generally, multi-server namespaces are for the most part uniform, in | Generally, multi-server namespaces are for the most part uniform, in | |||
that the same data made available to one client at a given location | that the same data made available to one client at a given location | |||
in the namespace is made available to all clients at that location. | in the namespace is made available to all clients at that location. | |||
However, there are facilities provided that allow different clients | However, as described above, there are facilities provided that allow | |||
to be directed different sets of data, to enable adaptation to such | different clients to be directed different sets of data, to enable | |||
client characteristics as CPU architecture. | adaptation to such client characteristics as CPU architecture. | |||
4.5.7. Changes in a Location Attribute (to be added) | 4.5.7. Changes in a Location Attribute (to be added) | |||
Although clients will typically fetch a location attribute when first | Although clients will typically fetch a location attribute when first | |||
accessing a file system and when NFS4ERR_MOVED is returned, a client | accessing a file system and when NFS4ERR_MOVED is returned, a client | |||
can choose to fetch the attribute periodically, in which case, the | can choose to fetch the attribute periodically, in which case the | |||
value fetched may change over time. | value fetched may change over time. | |||
For clients not prepared to access multiple replicas simultaneously | For clients not prepared to access multiple replicas simultaneously | |||
(see Section 8.1 of the current document), the handling of the | (see Section 8.1 of the current document), the handling of the | |||
various cases of change are as follows: | various cases of change is as follows: | |||
o Changes in the list of replicas or in the network addresses | o Changes in the list of replicas or in the network addresses | |||
associated with replicas do not require immediate action. The | associated with replicas do not require immediate action. The | |||
client will typically update its list of replicas to reflect the | client will typically update its list of replicas to reflect the | |||
new information. | new information. | |||
o Additions to the list of network addresses for the current file | o Additions to the list of network addresses for the current file | |||
system instance need not be acted on promptly. However the client | system instance need not be acted on promptly. However the client | |||
can choose to use the new address whenever it needs to switch | can choose to use the new address whenever it needs to switch | |||
access to a new replica. | access to a new replica. | |||
skipping to change at page 20, line 12 ¶ | skipping to change at page 21, line 19 ¶ | |||
adjusting its access even in the absence of difficulties that would | adjusting its access even in the absence of difficulties that would | |||
lead to a new replica to be selected. | lead to a new replica to be selected. | |||
o When a new replica is added which may be accessed simultaneously | o When a new replica is added which may be accessed simultaneously | |||
with one currently in use, the client is free to use the new | with one currently in use, the client is free to use the new | |||
replica immediately. | replica immediately. | |||
o When a replica currently in use is deleted from the list, the | o When a replica currently in use is deleted from the list, the | |||
client need not cease using it immediately. However, since the | client need not cease using it immediately. However, since the | |||
server may subsequently force such use to cease (by returning | server may subsequently force such use to cease (by returning | |||
NFS4ERR_MOVED), clients can decide to limit the need for later | NFS4ERR_MOVED), clients might decide to limit the need for later | |||
state transfer. For example, new opens might be done on other | state transfer. For example, new opens might be done on other | |||
replicas, rather than on one not present in the list. | replicas, rather than on one not present in the list. | |||
5. Re-organization of Section 11.7 of RFC5661 | 5. Re-organization of Section 11.7 of RFC5661 | |||
The material in Section 11.7 of [RFC5661] has been reorganized and | The material in Section 11.7 of [RFC5661] has been reorganized and | |||
augmented as specified below: | augmented as specified below: | |||
o Because there can be a shift of the network access paths used to | o Because there can be a shift of the network access paths used to | |||
access a file system instance without any shift between replicas, | access a file system instance without any shift between replicas, | |||
skipping to change at page 22, line 11 ¶ | skipping to change at page 23, line 19 ¶ | |||
o When there is no potential replacement address in use and there | o When there is no potential replacement address in use and there | |||
are no valid addresses session-trunkable with the one whose use is | are no valid addresses session-trunkable with the one whose use is | |||
to be discontinued, other server-trunkable addresses may be used | to be discontinued, other server-trunkable addresses may be used | |||
to provide continued access. Although use of CREATE_SESSION is | to provide continued access. Although use of CREATE_SESSION is | |||
available to provide continued access to the existing instance, | available to provide continued access to the existing instance, | |||
servers have the option of providing continued access to the | servers have the option of providing continued access to the | |||
existing session through the new network access path in a fashion | existing session through the new network access path in a fashion | |||
similar to that provided by session migration (see Section 9 of | similar to that provided by session migration (see Section 9 of | |||
the current document). To take advantage of this possibility, | the current document). To take advantage of this possibility, | |||
clients can perform an initial BIND_CONN_TO_SESSION, as in the | clients can perform an initial BIND_CONN_TO_SESSION, as in the | |||
previous case, and use CREATE_SESSION only when that fails. | previous case, and use CREATE_SESSION only if that fails. | |||
8. Effecting File System Transitions (as updated) | 8. Effecting File System Transitions (as updated) | |||
There are a range of situations in which there is a change to be | There are a range of situations in which there is a change to be | |||
effected in the set of replicas used to access a particular file | effected in the set of replicas used to access a particular file | |||
system. Some of these may involve an expansion or contraction of the | system. Some of these may involve an expansion or contraction of the | |||
set of replicas used as discussed in Section 8.1 below. | set of replicas used as discussed in Section 8.1 below. | |||
For reasons explained in that section, most transitions will involve | For reasons explained in that section, most transitions will involve | |||
a transition from a single replica to a corresponding replacement | a transition from a single replica to a corresponding replacement | |||
skipping to change at page 23, line 8 ¶ | skipping to change at page 24, line 17 ¶ | |||
effective continuity of locking state are discussed in Section 10 | effective continuity of locking state are discussed in Section 10 | |||
of the current document. | of the current document. | |||
o The servers' (source and destination) responsibilities in | o The servers' (source and destination) responsibilities in | |||
effecting Transparent Migration of locking and session state are | effecting Transparent Migration of locking and session state are | |||
discussed in Section 11 of the current document. | discussed in Section 11 of the current document. | |||
8.1. File System Transitions and Simultaneous Access (as updated) | 8.1. File System Transitions and Simultaneous Access (as updated) | |||
The fs_locations_info attribute (described in Section 11.10.1 of | The fs_locations_info attribute (described in Section 11.10.1 of | |||
[RFC5661]) may indicate that two replicas may be used simultaneously | [RFC5661] and Section 12.2 of this document) may indicate that two | |||
(see Section 11.7.2.1 of [RFC5661] for details). Although situations | replicas may be used simultaneously (see Section 11.7.2.1 of | |||
in which multiple replicas may be accessed simultaneously are | [RFC5661] for details). Although situations in which multiple | |||
somewhat similar to those in which a single replica is accessed by | replicas may be accessed simultaneously are somewhat similar to those | |||
multiple network addresses, there are important differences, since | in which a single replica is accessed by multiple network addresses, | |||
locking state is not shared among multiple replicas. | there are important differences, since locking state is not shared | |||
among multiple replicas. | ||||
Because of this difference in state handling, many clients will not | Because of this difference in state handling, many clients will not | |||
have the ability to take advantage of the fact that such replicas | have the ability to take advantage of the fact that such replicas | |||
represent the same data. Such clients will not be prepared to use | represent the same data. Such clients will not be prepared to use | |||
multiple replicas simultaneously but will access each file system | multiple replicas simultaneously but will access each file system | |||
using only a single replica, although the replica selected may make | using only a single replica, although the replica selected might make | |||
multiple server-trunkable addresses available. | multiple server-trunkable addresses available. | |||
Clients who are prepared to use multiple replicas simultaneously will | Clients who are prepared to use multiple replicas simultaneously will | |||
divide opens among replicas however they choose. Once that choice is | divide opens among replicas however they choose. Once that choice is | |||
made, any subsequent transitions will treat the set of locking state | made, any subsequent transitions will treat the set of locking state | |||
associated with each replica as a single entity. | associated with each replica as a single entity. | |||
For example, if one of the replicas become unavailable, access will | For example, if one of the replicas become unavailable, access will | |||
be transferred to a different replica, also capable of simultaneous | be transferred to a different replica, also capable of simultaneous | |||
access with the one still in use. | access with the one still in use. | |||
skipping to change at page 25, line 38 ¶ | skipping to change at page 26, line 49 ¶ | |||
possible. | possible. | |||
Although normally a single source file system will transition to a | Although normally a single source file system will transition to a | |||
single target file system, there is a provision for splitting a | single target file system, there is a provision for splitting a | |||
single source file system into multiple target file systems, by | single source file system into multiple target file systems, by | |||
specifying the FSLI4F_MULTI_FS flag. | specifying the FSLI4F_MULTI_FS flag. | |||
8.4.1. File System Splitting (as updated) | 8.4.1. File System Splitting (as updated) | |||
When a file system transition is made and the fs_locations_info | When a file system transition is made and the fs_locations_info | |||
indicates that the file system in question may be split into multiple | indicates that the file system in question might be split into | |||
file systems (via the FSLI4F_MULTI_FS flag), the client SHOULD do | multiple file systems (via the FSLI4F_MULTI_FS flag), the client | |||
GETATTRs to determine the fsid attribute on all known objects within | SHOULD do GETATTRs to determine the fsid attribute on all known | |||
the file system undergoing transition to determine the new file | objects within the file system undergoing transition to determine the | |||
system boundaries. | new file system boundaries. | |||
Clients may maintain the fsids passed to existing applications by | Clients might choose to maintain the fsids passed to existing | |||
mapping all of the fsids for the descendant file systems to the | applications by mapping all of the fsids for the descendant file | |||
common fsid used for the original file system. | systems to the common fsid used for the original file system. | |||
Splitting a file system may be done on a transition between file | Splitting a file system can be done on a transition between file | |||
systems of the same fileid class, since the fact that fileids are | systems of the same fileid class, since the fact that fileids are | |||
unique within the source file system ensure they will be unique in | unique within the source file system ensure they will be unique in | |||
each of the target file systems. | each of the target file systems. | |||
8.5. The Change Attribute and File System Transitions (as updated) | 8.5. The Change Attribute and File System Transitions (as updated) | |||
Since the change attribute is defined as a server-specific one, | Since the change attribute is defined as a server-specific one, | |||
change attributes fetched from one server are normally presumed to be | change attributes fetched from one server are normally presumed to be | |||
invalid on another server. Such a presumption is troublesome since | invalid on another server. Such a presumption is troublesome since | |||
it would invalidate all cached change attributes, requiring | it would invalidate all cached change attributes, requiring | |||
skipping to change at page 26, line 27 ¶ | skipping to change at page 27, line 38 ¶ | |||
happening to result in an identical change value. | happening to result in an identical change value. | |||
When the two file systems have consistent change attribute formats, | When the two file systems have consistent change attribute formats, | |||
and this fact is communicated to the client by reporting in the same | and this fact is communicated to the client by reporting in the same | |||
change class, the client may assume a continuity of change attribute | change class, the client may assume a continuity of change attribute | |||
construction and handle this situation just as it would be handled | construction and handle this situation just as it would be handled | |||
without any file system transition. | without any file system transition. | |||
8.6. Write Verifiers and File System Transitions (as updated) | 8.6. Write Verifiers and File System Transitions (as updated) | |||
In a file system transition, the two file systems may be clustered in | In a file system transition, the two file systems might be clustered | |||
the handling of unstably written data. When this is the case, and | in the handling of unstably written data. When this is the case, and | |||
the two file systems belong to the same write-verifier class, write | the two file systems belong to the same write-verifier class, write | |||
verifiers returned from one system may be compared to those returned | verifiers returned from one system may be compared to those returned | |||
by the other and superfluous writes avoided. | by the other and superfluous writes avoided. | |||
When two file systems belong to different write-verifier classes, any | When two file systems belong to different write-verifier classes, any | |||
verifier generated by one must not be compared to one provided by the | verifier generated by one must not be compared to one provided by the | |||
other. Instead, the two verifiers should be treated as not equal | other. Instead, the two verifiers should be treated as not equal | |||
even when the values are identical. | even when the values are identical. | |||
8.7. Readdir Cookies and Verifiers and File System Transitions (as | 8.7. Readdir Cookies and Verifiers and File System Transitions (as | |||
updated) | updated) | |||
In a file system transition, the two file systems may be consistent | In a file system transition, the two file systems might be consistent | |||
in their handling of READDIR cookies and verifiers. When this is the | in their handling of READDIR cookies and verifiers. When this is the | |||
case, and the two file systems belong to the same readdir class, | case, and the two file systems belong to the same readdir class, | |||
READDIR cookies and verifiers from one system may be recognized by | READDIR cookies and verifiers from one system may be recognized by | |||
the other and READDIR operations started on one server may be validly | the other and READDIR operations started on one server may be validly | |||
continued on the other, simply by presenting the cookie and verifier | continued on the other, simply by presenting the cookie and verifier | |||
returned by a READDIR operation done on the first file system to the | returned by a READDIR operation done on the first file system to the | |||
second. | second. | |||
When two file systems belong to different readdir classes, any | When two file systems belong to different readdir classes, any | |||
READDIR cookie and verifier generated by one is not valid on the | READDIR cookie and verifier generated by one is not valid on the | |||
skipping to change at page 28, line 33 ¶ | skipping to change at page 29, line 47 ¶ | |||
8.9. Lock State and File System Transitions (as updated) | 8.9. Lock State and File System Transitions (as updated) | |||
While accessing a file system, clients obtain locks enforced by the | While accessing a file system, clients obtain locks enforced by the | |||
server which may prevent actions by other clients that are | server which may prevent actions by other clients that are | |||
inconsistent with those locks. | inconsistent with those locks. | |||
When access is transferred between replicas, clients need to be | When access is transferred between replicas, clients need to be | |||
assured that the actions disallowed by holding these locks cannot | assured that the actions disallowed by holding these locks cannot | |||
have occurred during the transition. This can be ensured by the | have occurred during the transition. This can be ensured by the | |||
methods below. If at least one of these is not implemented, clients | methods below. Unless at least one of these is implemented, clients | |||
will not be assured of continuity of lock possession across a | will not be assured of continuity of lock possession across a | |||
migration event. | migration event. | |||
o Providing the client an opportunity to re-obtain his locks via a | o Providing the client an opportunity to re-obtain his locks via a | |||
per-fs grace period on the destination server. Because the lock | per-fs grace period on the destination server. Because the lock | |||
reclaim mechanism was originally defined to support server reboot, | reclaim mechanism was originally defined to support server reboot, | |||
it implicitly assumes that file handles will on reclaim will be | it implicitly assumes that file handles will on reclaim will be | |||
the same as those at open. In the case of migration this requires | the same as those at open. In the case of migration, this | |||
that source and destination servers use the same filehandles, as | requires that source and destination servers use the same | |||
evidenced by using the same server scope (see Section 12.2 of the | filehandles, as evidenced by using the same server scope (see | |||
current document) or by showing this agreement using | Section 13.2 of the current document) or by showing this agreement | |||
fs_locations_info (see Section 8.2 above). | using fs_locations_info (see Section 8.2 above). | |||
o Transferring locking state as part of the transition as described | o Locking state can be transferred as part of the transition by | |||
in Section 9 of the current document to provide Transparent State | providing Transparent State Migration as described in Section 9 of | |||
Migration. | the current document. | |||
Of these, Transparent State Migration provides the smoother | Of these, Transparent State Migration provides the smoother | |||
experience for clients in that there is no grace-period-based delay | experience for clients in that there is no grace-period-based delay | |||
before new locks can be obtained. However, it requires a greater | before new locks can be obtained. However, it requires a greater | |||
degree of inter-server co-ordination. In general, the servers taking | degree of inter-server co-ordination. In general, the servers taking | |||
part in migration are free to provide either facility. However, when | part in migration are free to provide either facility. However, when | |||
the filehandles can differ across the migration event, Transparent | the filehandles can differ across the migration event, Transparent | |||
State Migration is the only available means of providing the needed | State Migration is the only available means of providing the needed | |||
functionality. | functionality. | |||
It should be noted that these two methods are not mutually exclusive | It should be noted that these two methods are not mutually exclusive | |||
and that a server might well provide both. In particular, if there | and that a server might well provide both. In particular, if there | |||
is some circumstance preventing a specific lock from being | is some circumstance preventing a specific lock from being | |||
transferred transparently, the server can allow it to be reclaimed. | transferred transparently, the destination server can allow it to be | |||
reclaimed, by implementing a per-fs grace period for the migrated | ||||
file system. | ||||
9. Transferring State upon Migration (to be added) | 9. Transferring State upon Migration (to be added) | |||
When the transition is a result of a server-initiated decision to | When the transition is a result of a server-initiated decision to | |||
transition access and the source and destination servers have | transition access and the source and destination servers have | |||
implemented appropriate co-operation, it is possible to: | implemented appropriate co-operation, it is possible to: | |||
o Transfer locking state from the source to the destination server, | o Transfer locking state from the source to the destination server, | |||
in a fashion similar to that provide by Transparent State | in a fashion similar to that provided by Transparent State | |||
Migration in NFSv4.0, as described in [RFC7931]. Server | Migration in NFSv4.0, as described in [RFC7931]. Server | |||
responsibilities are described in Section 11.1 of the current | responsibilities are described in Section 11.2 of the current | |||
document. | document. | |||
o Transfer session state from the source to the destination server. | o Transfer session state from the source to the destination server. | |||
Server responsibilities in effecting such a transfer are described | Server responsibilities in effecting such a transfer are described | |||
in Section 11.2 of the current document. | in Section 11.3 of the current document. | |||
The means by which the client determines which of these transfer | The means by which the client determines which of these transfer | |||
events has occurred are described in Section 10 of the current | events has occurred are described in Section 10 of the current | |||
document. | document. | |||
9.1. Transparent State Migration and pNFS (to be added) | 9.1. Transparent State Migration and pNFS (to be added) | |||
When pNFS is involved, the protocol is capable of supporting: | When pNFS is involved, the protocol is capable of supporting: | |||
o Migration of the Metadata Server (MDS), leaving the Data Servers | o Migration of the Metadata Server (MDS), leaving the Data Servers | |||
skipping to change at page 30, line 42 ¶ | skipping to change at page 32, line 11 ¶ | |||
Migration may transfer a file system from a server which does not | Migration may transfer a file system from a server which does not | |||
support pNFS to one which does. In order to properly adapt to this | support pNFS to one which does. In order to properly adapt to this | |||
situation, clients which support pNFS, but function adequately in its | situation, clients which support pNFS, but function adequately in its | |||
absence should check for pNFS support when a file system is migrated | absence should check for pNFS support when a file system is migrated | |||
and be prepared to use pNFS when support is available on the | and be prepared to use pNFS when support is available on the | |||
destination. | destination. | |||
10. Client Responsibilities when Access is Transitioned (to be added) | 10. Client Responsibilities when Access is Transitioned (to be added) | |||
For a client to respond to an access transition, it must be made | For a client to respond to an access transition, it must become aware | |||
aware of it. The ways in which this can happen are discussed in | of it. The ways in which this can happen are discussed in | |||
Section 10.1 which discusses indications that a specific file system | Section 10.1 which discusses indications that a specific file system | |||
access path has transitioned as well as situations in which | access path has transitioned as well as situations in which | |||
additional activity is necessary to determine the set of file systems | additional activity is necessary to determine the set of file systems | |||
that have been migrated. Section 10.2 goes on to complete the | that have been migrated. Section 10.2 goes on to complete the | |||
discussion of how the set of migrated file systems might be | discussion of how the set of migrated file systems might be | |||
determined. Sections 10.3 through 10.5 discuss how the client should | determined. Sections 10.3 through 10.5 discuss how the client should | |||
deal with each transition it becomes aware of, either directly or as | deal with each transition it becomes aware of, either directly or as | |||
a result of migration discovery. | a result of migration discovery. | |||
The following terms are used to describe client activities: | The following terms are used to describe client activities: | |||
skipping to change at page 32, line 15 ¶ | skipping to change at page 33, line 35 ¶ | |||
to respond by using the location information to access the file | to respond by using the location information to access the file | |||
system at its new location to ensure that leases are not | system at its new location to ensure that leases are not | |||
needlessly expired. | needlessly expired. | |||
Unlike the case of NFSv4.0, in which the corresponding conditions are | Unlike the case of NFSv4.0, in which the corresponding conditions are | |||
both errors and thus mutually exclusive, in NFSv4.1 the client can, | both errors and thus mutually exclusive, in NFSv4.1 the client can, | |||
and often will, receive both indications on the same request. As a | and often will, receive both indications on the same request. As a | |||
result, implementations need to address the question of how to co- | result, implementations need to address the question of how to co- | |||
ordinate the necessary recovery actions when both indications arrive | ordinate the necessary recovery actions when both indications arrive | |||
in the response to the same request. It should be noted that when | in the response to the same request. It should be noted that when | |||
processing an NFSv4 COMPOUND, the server decides whether | processing an NFSv4 COMPOUND, the server will normally decide whether | |||
SEQ4_STATUS_LEASE_MOVED is to be set before it determines which file | SEQ4_STATUS_LEASE_MOVED is to be set before it determines which file | |||
system will be referenced or whether NFS4ERR_MOVED is to be returned. | system will be referenced or whether NFS4ERR_MOVED is to be returned. | |||
Since these indications are not mutually exclusive in NFSv4.1, the | Since these indications are not mutually exclusive in NFSv4.1, the | |||
following combinations are possible results when a COMPOUND is | following combinations are possible results when a COMPOUND is | |||
issued: | issued: | |||
o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED | o The COMPOUND status is NFS4ERR_MOVED and SEQ4_STATUS_LEASE_MOVED | |||
is asserted. | is asserted. | |||
skipping to change at page 33, line 16 ¶ | skipping to change at page 34, line 33 ¶ | |||
file system(s) accessed by the request. However, to prevent | file system(s) accessed by the request. However, to prevent | |||
avoidable lease expiration, migration discovery needs to be done | avoidable lease expiration, migration discovery needs to be done | |||
o The COMPOUND status is not NFS4ERR_MOVED and | o The COMPOUND status is not NFS4ERR_MOVED and | |||
SEQ4_STATUS_LEASE_MOVED is clear. | SEQ4_STATUS_LEASE_MOVED is clear. | |||
In this case, neither transition-related activity nor migration | In this case, neither transition-related activity nor migration | |||
discovery is required. | discovery is required. | |||
Note that the specified actions only need to be taken if they are not | Note that the specified actions only need to be taken if they are not | |||
already going on. For example NFS4ERR_MOVED on a file system for | already going on. For example, when NFS4ERR_MOVED is received when | |||
which transition recovery already going on merely waits for that | accessing a file system for which transition recovery already going | |||
recovery to be completed while SEQ4_STATUS_LEASE_MOVED only needs to | on, the client merely waits for that recovery to be completed while | |||
the receipt of SEQ4_STATUS_LEASE_MOVED indication only needs to | ||||
initiate migration discovery for a server if it is not going on for | initiate migration discovery for a server if it is not going on for | |||
that server. | that server. | |||
The fact that a lease-migrated condition does not result in an error | The fact that a lease-migrated condition does not result in an error | |||
in NFSv4.1 has a number of important consequences. In addition to | in NFSv4.1 has a number of important consequences. In addition to | |||
the fact, discussed above, that the two indications are not mutually | the fact, discussed above, that the two indications are not mutually | |||
exclusive, there are number of issues that are important in | exclusive, there are number of issues that are important in | |||
considering implementation of migration discovery, as discussed in | considering implementation of migration discovery, as discussed in | |||
Section 10.2. | Section 10.2. | |||
skipping to change at page 34, line 26 ¶ | skipping to change at page 35, line 43 ¶ | |||
o For such indications received in all other contexts, the | o For such indications received in all other contexts, the | |||
appropriate response is to initiate or otherwise provide for the | appropriate response is to initiate or otherwise provide for the | |||
execution of migration discovery for file systems associated with | execution of migration discovery for file systems associated with | |||
the server IP address returning the indication. | the server IP address returning the indication. | |||
This leaves a potential difficulty in situations in which the | This leaves a potential difficulty in situations in which the | |||
migration discovery process is near to completion but is still | migration discovery process is near to completion but is still | |||
operating. One should not ignore a LEASE_MOVED indication if the | operating. One should not ignore a LEASE_MOVED indication if the | |||
migration discovery process is not able to respond to the discovery | migration discovery process is not able to respond to the discovery | |||
of additional migrating file system without additional aid. A | of additional migrating file systems without additional aid. A | |||
further complexity relevant in addressing such situations is that a | further complexity relevant in addressing such situations is that a | |||
lease-migrated indication may reflect the server's state at the time | lease-migrated indication may reflect the server's state at the time | |||
the SEQUENCE operation was processed, which may be different from | the SEQUENCE operation was processed, which may be different from | |||
that in effect at the time the response is received. Because new | that in effect at the time the response is received. Because new | |||
migration events may occur at any time, and because a LEASE_MOVED | migration events may occur at any time, and because a LEASE_MOVED | |||
indication may reflect the situation in effect a considerable time | indication may reflect the situation in effect a considerable time | |||
before the indication is received, special care needs to be taken to | before the indication is received, special care needs to be taken to | |||
ensure that LEASE_MOVED indications are not inappropriately ignored. | ensure that LEASE_MOVED indications are not inappropriately ignored. | |||
A useful approach to this issue involves the use of separate | A useful approach to this issue involves the use of separate | |||
skipping to change at page 35, line 22 ¶ | skipping to change at page 36, line 39 ¶ | |||
STATUS4_REFERRAL) and thus that it is likely that the fetch of the | STATUS4_REFERRAL) and thus that it is likely that the fetch of the | |||
location attribute has cleared one the file systems contributing | location attribute has cleared one the file systems contributing | |||
to the lease-migrated indication. | to the lease-migrated indication. | |||
o In cases in which that happened, the thread cannot know whether | o In cases in which that happened, the thread cannot know whether | |||
the lease-migrated indication has been cleared and so it enters | the lease-migrated indication has been cleared and so it enters | |||
the completion/verification state and proceeds to issue a COMPOUND | the completion/verification state and proceeds to issue a COMPOUND | |||
to see if the LEASE_MOVED indication has been cleared. | to see if the LEASE_MOVED indication has been cleared. | |||
o When the discovery process is in the completion/verification | o When the discovery process is in the completion/verification | |||
state, if others get a lease-migrated indication they note the it | state, if others request get a lease-migrated indication they note | |||
was received and the existence of such indications is used when | that it was received and the existence of such indications is used | |||
the request completes, as described below. | when the request completes, as described below. | |||
When the request used in the completion/verification state completes: | When the request used in the completion/verification state completes: | |||
o If a lease-migrated indication is returned, the discovery | o If a lease-migrated indication is returned, the discovery | |||
continues normally. Note that this is so even if all file systems | continues normally. Note that this is so even if all file systems | |||
have traversed, since new migrations could have occurred while the | have traversed, since new migrations could have occurred while the | |||
process was going on. | process was going on. | |||
o Otherwise, if there is any record that other requests saw a lease- | o Otherwise, if there is any record that other requests saw a lease- | |||
migrated indication, that record is cleared and the verification | migrated indication while the request was going on, that record is | |||
request retried. The discovery process remains in completion/ | cleared and the verification request retried. The discovery | |||
verification state. | process remains in completion/verification state. | |||
o If there have been no lease-migrated indications, the work of | o If there have been no lease-migrated indications, the work of | |||
migration discovery is considered completed and it enters the non- | migration discovery is considered completed and it enters the non- | |||
operating state. Once it enters this state, subsequent lease- | operating state. Once it enters this state, subsequent lease- | |||
migrated indication will trigger a new migration discovery | migrated indication will trigger a new migration discovery | |||
process. | process. | |||
It should be noted that the process described above is not guaranteed | It should be noted that the process described above is not guaranteed | |||
to terminate, as a long series of new migration events might | to terminate, as a long series of new migration events might | |||
continually delay the clearing of the LEASE_MOVED indication. To | continually delay the clearing of the LEASE_MOVED indication. To | |||
skipping to change at page 36, line 34 ¶ | skipping to change at page 37, line 50 ¶ | |||
During the first phase of this process, the client proceeds to | During the first phase of this process, the client proceeds to | |||
examine location entries to find the initial network address it will | examine location entries to find the initial network address it will | |||
use to continue access to the file system or its replacement. For | use to continue access to the file system or its replacement. For | |||
each location entry that the client examines, the process consists of | each location entry that the client examines, the process consists of | |||
five steps: | five steps: | |||
1. Performing an EXCHANGE_ID directed at the location address. This | 1. Performing an EXCHANGE_ID directed at the location address. This | |||
operation is used to register the client-owner with the server, | operation is used to register the client-owner with the server, | |||
to obtain a client ID to be use subsequently to communicate with | to obtain a client ID to be use subsequently to communicate with | |||
it, to obtain tat client ID's confirmation status and, to | it, to obtain that client ID's confirmation status, and to | |||
determine server_owner and scope for the purpose of determining | determine server_owner and scope for the purpose of determining | |||
if the entry is trunkable with that previously being used to | if the entry is trunkable with that previously being used to | |||
access the file system (i.e. that it represents another network | access the file system (i.e. that it represents another network | |||
access path to the same file system and can share locking state | access path to the same file system and can share locking state | |||
with it). | with it). | |||
2. Making an initial determination of whether migration has | 2. Making an initial determination of whether migration has | |||
occurred. The initial determination will be based on whether the | occurred. The initial determination will be based on whether the | |||
EXCHANGE_ID results indicate that the current location element is | EXCHANGE_ID results indicate that the current location element is | |||
server-trunkable with that used to access the file system when | server-trunkable with that used to access the file system when | |||
skipping to change at page 37, line 38 ¶ | skipping to change at page 39, line 7 ¶ | |||
During this later phase of the process, further location entries are | During this later phase of the process, further location entries are | |||
examined using the abbreviated procedure specified below: | examined using the abbreviated procedure specified below: | |||
1. Before the EXCHANGE_ID, the fs name of the location entry is | 1. Before the EXCHANGE_ID, the fs name of the location entry is | |||
examined and if it does not match that currently being used, the | examined and if it does not match that currently being used, the | |||
entry is ignored. otherwise, one proceeds as specified by step 1 | entry is ignored. otherwise, one proceeds as specified by step 1 | |||
above,. | above,. | |||
2. In the case that the network address is session-trunkable with | 2. In the case that the network address is session-trunkable with | |||
one used previously a BIND_CONN_TO_SESSION is used to access that | one used previously a BIND_CONN_TO_SESSION is used to access that | |||
session using new network address. Otherwise, or if the bind | session using the new network address. Otherwise, or if the bind | |||
operation fails, a CREATE_SESSION is done. | operation fails, a CREATE_SESSION is done. | |||
3. The verification procedure referred to in step 4 above is used. | 3. The verification procedure referred to in step 4 above is used. | |||
However, if it fails, the entry is ignored and the next available | However, if it fails, the entry is ignored and the next available | |||
entry is used. | entry is used. | |||
10.4. Obtaining Access to Sessions and State after Migration (to be | 10.4. Obtaining Access to Sessions and State after Migration (to be | |||
added) | added) | |||
In the event that migration has occurred, migration recovery will | In the event that migration has occurred, migration recovery will | |||
skipping to change at page 38, line 20 ¶ | skipping to change at page 39, line 37 ¶ | |||
existing client ID representing the client to the destination | existing client ID representing the client to the destination | |||
server. In this state merger case, Transparent State Migration | server. In this state merger case, Transparent State Migration | |||
might or might not have occurred and a determination as to whether | might or might not have occurred and a determination as to whether | |||
it has occurred is deferred until sessions are established and the | it has occurred is deferred until sessions are established and the | |||
client is ready to begin state recovery. | client is ready to begin state recovery. | |||
o If the client ID is a confirmed client ID not previously known to | o If the client ID is a confirmed client ID not previously known to | |||
the client, then the client can conclude that the client ID was | the client, then the client can conclude that the client ID was | |||
transferred as part of Transparent State Migration. In this | transferred as part of Transparent State Migration. In this | |||
transferred client ID case, Transparent State Migration has | transferred client ID case, Transparent State Migration has | |||
occurred although some state may have been lost. | occurred although some state might have been lost. | |||
Once the client ID has been obtained, it is necessary to obtain | Once the client ID has been obtained, it is necessary to obtain | |||
access to sessions to continue communication with the new server. In | access to sessions to continue communication with the new server. In | |||
any of the cases in which Transparent State Migration has occurred, | any of the cases in which Transparent State Migration has occurred, | |||
it is possible that a session was transferred as well. To deal with | it is possible that a session was transferred as well. To deal with | |||
that possibility, clients can, after doing the EXCHANGE_ID, issue a | that possibility, clients can, after doing the EXCHANGE_ID, issue a | |||
BIND_CONN_TO_SESSION to connect the transferred session to a | BIND_CONN_TO_SESSION to connect the transferred session to a | |||
connection to the new server. If that fails, it is an indication | connection to the new server. If that fails, it is an indication | |||
that the session was not transferred and that a new session needs to | that the session was not transferred and that a new session needs to | |||
be created to take its place. | be created to take its place. | |||
skipping to change at page 39, line 29 ¶ | skipping to change at page 40, line 46 ¶ | |||
o In a case in which Transparent State Migration has occurred, and | o In a case in which Transparent State Migration has occurred, and | |||
no lock state was lost (as shown by SEQ4_STATUS flags), no lock | no lock state was lost (as shown by SEQ4_STATUS flags), no lock | |||
reclaim is necessary. | reclaim is necessary. | |||
o In a case in which Transparent State Migration has occurred, and | o In a case in which Transparent State Migration has occurred, and | |||
some lock state was lost (as shown by SEQ4_STATUS flags), existing | some lock state was lost (as shown by SEQ4_STATUS flags), existing | |||
stateids need to be checked for validity using TEST_STATEID, and | stateids need to be checked for validity using TEST_STATEID, and | |||
reclaim used to re-establish any that were not transferred. | reclaim used to re-establish any that were not transferred. | |||
For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value | For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value | |||
of true should be done before normal use of the file system including | of TRUE needs to be done before normal use of the file system | |||
obtaining new locks for the file system. This applies even if no | including obtaining new locks for the file system. This applies even | |||
locks were lost and there was no need for any to be reclaimed. | if no locks were lost and there was no need for any to be reclaimed. | |||
10.5. Obtaining Access to Sessions and State after Network Address | 10.5. Obtaining Access to Sessions and State after Network Address | |||
Transfer (to be added) | Transfer (to be added) | |||
The case in which there is a transfer to a new network address | The case in which there is a transfer to a new network address | |||
without migration is similar to that described in Section 10.4 above | without migration is similar to that described in Section 10.4 above | |||
in that there is a need to obtain access to needed sessions and | in that there is a need to obtain access to needed sessions and | |||
locking state. However, the details are simpler and will vary | locking state. However, the details are simpler and will vary | |||
depending on the type of trunking between the address receiving | depending on the type of trunking between the address receiving | |||
NFS4ERR_MOVED and that to which the transfer is to be made | NFS4ERR_MOVED and that to which the transfer is to be made | |||
To make a session available for use, a BIND_CONN_TO_SESSION should be | To make a session available for use, a BIND_CONN_TO_SESSION should be | |||
used to obtain access to the session previously in use. Only if this | used to obtain access to the session previously in use. Only if this | |||
fails, should a CREATE_SESSION be done. While this procedure mirrors | fails, should a CREATE_SESSION be done. While this procedure mirrors | |||
that in Section 10.4 above, there is an important difference in that | that in Section 10.4 above, there is an important difference in that | |||
preservation of the session is not purely optional but depends on the | preservation of the session is not purely optional but depends on the | |||
type of trunking. | type of trunking. | |||
Access to appropriate locking state should need no actions beyond | Access to appropriate locking state should need no actions beyond | |||
access to the session. However. the SEQ4_STATUS bits should be | access to the session. However, the SEQ4_STATUS bits need to be | |||
checked for lost locking state, including the need to reclaim locks | checked for lost locking state, including the need to reclaim locks | |||
after a server reboot. | after a server reboot. | |||
11. Server Responsibilities Upon Migration (to be added) | 11. Server Responsibilities Upon Migration (to be added) | |||
In order to effect Transparent State Migration and possibly session | In the event of file system migration, when the client connects to | |||
migration, the source and server need to co-operate to transfer | the destination server, it needs to be able to provide the client | |||
certain client-relevant information. The sections below discuss the | continued to access the files it had open on the source server. | |||
information to be transferred but do not define the specifics of the | There are two ways to provide this: | |||
transfer protocol. This is left as an implementation choice although | ||||
o By provision of an fs-specific grace period, allowing the client | ||||
the ability to reclaim its locks, in a fashion similar to what | ||||
would have been done in the case of recovery from a server | ||||
restart. See Section 11.1 for a more complete discussion. | ||||
o By implementing Transparent State Migration possibly in connection | ||||
with session migration, the server can provide the client | ||||
immediate access to the state built up on the source server, on | ||||
the destination. | ||||
These features are discussed separately in Sections 11.2 and 11.3, | ||||
which discuss Transparent State Migration and session migration | ||||
respectively. | ||||
All the features described above can involve transfer of lock-related | ||||
information between source and destination servers. In some cases | ||||
this transfer is a necessary part of the implementation while in | ||||
other cases it is a helpful implementation aid which servers might or | ||||
might not use. The sub-sections below discuss the information which | ||||
would transferred but do not define the specifics of the transfer | ||||
protocol. This is left as an implementation choice although | ||||
standards in this area could be developed at a later time. | standards in this area could be developed at a later time. | |||
Transparent State Migration and session migration are discussed | 11.1. Server Responsibilities in Effecting State Reclaim after | |||
separately, in Sections 11.1 and 11.2 below respectively. In each | Migration (to be added) | |||
case, the discussion addresses the issue of providing the client a | ||||
consistent view of the transferred state, even though the transfer | ||||
might take an extended time. | ||||
11.1. Server Responsibilities in Effecting Transparent State Migration | In this case, destination server need have no knowledge of the locks | |||
held on the source server, but relies on the clients to accurately | ||||
report (via reclaim operations) the locks previously held, not | ||||
allowing new locks to be granted on migrated file system until the | ||||
grace period expires. | ||||
During this grace period clients have the opportunity to use reclaim | ||||
operations to obtain locks for file system objects within the | ||||
migrated file system, in the same way that they do when recovering | ||||
from server restart, and the servers typically rely on clients to | ||||
accurately report their locks, although they have the option of | ||||
subjecting these requests to verification. If the clients only | ||||
reclaim locks held on the source server, no conflict can arise. Once | ||||
the client has reclaimed its locks, it indicates the completion of | ||||
lock reclamation by performing a RECLAIM_COMPLETE specifying | ||||
rca_one_fs as TRUE. | ||||
While it is not necessary for source and destination servers to co- | ||||
operate to transfer information about locks, implementations are | ||||
well-advised to consider transferring the following useful | ||||
information: | ||||
o If information about the set of clients that have locking state | ||||
for the transferred file system, the destination server will be | ||||
able to terminate the grace period once all such clients have | ||||
reclaimed their locks, allowing normal locking activity to resume | ||||
earlier than it would have otherwise. | ||||
o Locking summary information for individual clients (at various | ||||
possible levels of detail) can detect some instances in which | ||||
clients do not accurately represent the locks held on the source | ||||
server. | ||||
11.2. Server Responsibilities in Effecting Transparent State Migration | ||||
(to be added) | (to be added) | |||
The basic responsibility of the source server in effecting | The basic responsibility of the source server in effecting | |||
Transparent State Migration is to make available to the destination | Transparent State Migration is to make available to the destination | |||
server a description of each piece of locking state associated with | server a description of each piece of locking state associated with | |||
the file system being migrated. In addition to client id string and | the file system being migrated. In addition to client id string and | |||
verifier, the source server needs to provide, for each stateid: | verifier, the source server needs to provide, for each stateid: | |||
o The stateid including the current sequence value. | o The stateid including the current sequence value. | |||
skipping to change at page 42, line 9 ¶ | skipping to change at page 44, line 30 ¶ | |||
longer exists. | longer exists. | |||
o Sequencing of operations is no longer done using owner-based | o Sequencing of operations is no longer done using owner-based | |||
operation sequences numbers. Instead, sequencing is session- | operation sequences numbers. Instead, sequencing is session- | |||
based | based | |||
As a result, when sessions are not transferred, the techniques | As a result, when sessions are not transferred, the techniques | |||
discussed in Section 7.2 of [RFC7931] are adequate and will not be | discussed in Section 7.2 of [RFC7931] are adequate and will not be | |||
further discussed. | further discussed. | |||
11.2. Server Responsibilities in Effecting Session Transfer (to be | 11.3. Server Responsibilities in Effecting Session Transfer (to be | |||
added) | added) | |||
The basic responsibility of the source server in effecting session | The basic responsibility of the source server in effecting session | |||
transfer is to make available to the destination server a description | transfer is to make available to the destination server a description | |||
of the current state of each slot with the session, including: | of the current state of each slot with the session, including: | |||
o The last sequence value received for that slot. | o The last sequence value received for that slot. | |||
o Whether there is cached reply data for the last request executed | o Whether there is cached reply data for the last request executed | |||
and, if so, the cached reply. | and, if so, the cached reply. | |||
skipping to change at page 44, line 8 ¶ | skipping to change at page 46, line 29 ¶ | |||
An important issue is that the specification needs to take note of | An important issue is that the specification needs to take note of | |||
all potential COMPOUNDs, even if they might be unlikely in practice. | all potential COMPOUNDs, even if they might be unlikely in practice. | |||
For example, a COMPOUND is allowed to access multiple file systems | For example, a COMPOUND is allowed to access multiple file systems | |||
and might perform non-idempotent operations in some of them before | and might perform non-idempotent operations in some of them before | |||
accessing a file system being migrated. Also, a COMPOUND may return | accessing a file system being migrated. Also, a COMPOUND may return | |||
considerable data in the response, before being rejected with | considerable data in the response, before being rejected with | |||
NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as | NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as | |||
sa_cachethis. | sa_cachethis. | |||
To address these issues, the destination server MAY do any of the | To address these issues, a destination server MAY do any of the | |||
following. | following when implementing session transfer. | |||
o Avoid enforcing any sequencing semantics for a particular slot | o Avoid enforcing any sequencing semantics for a particular slot | |||
until the client has established the starting sequence for that | until the client has established the starting sequence for that | |||
slot on the destination server. | slot on the destination server. | |||
o For each slot, avoid returning a cached reply returning | o For each slot, avoid returning a cached reply returning | |||
NFS4ERR_DELAY or NFS4ERR_MOVED until the client has established | NFS4ERR_DELAY or NFS4ERR_MOVED until the client has established | |||
the starting sequence for that slot on the destination server. | the starting sequence for that slot on the destination server. | |||
o Until the client has established the starting sequence for a | o Until the client has established the starting sequence for a | |||
particular slot on the destination server, avoid reporting | particular slot on the destination server, avoid reporting | |||
NFS4ERR_SEQ_MISORDERED or return a cached reply returning | NFS4ERR_SEQ_MISORDERED or return a cached reply returning | |||
NFS4ERR_DELAY or NFS4ERR_MOVED, where the reply consists solely of | NFS4ERR_DELAY or NFS4ERR_MOVED, where the reply consists solely of | |||
a series of operations where the response is NFS4_OK until the | a series of operations where the response is NFS4_OK until the | |||
final error. | final error. | |||
12. Changes to RFC5661 outside Section 11 | 12. fs_locations_info | |||
12.1. Updates to treatment of fs_locations_info | ||||
Various elements of the fs_locations_info attribute contain | ||||
information that applies to either a specific filesystem replica or | ||||
to a network path or set of network paths used to access such a | ||||
replica. The existing treatment of fs_locations info (in | ||||
Section 11.10 of [RFC5661]) does not clearly distinguish these cases, | ||||
in part because the document did not clearly distinguish replicas | ||||
from the paths used to access them. | ||||
In addition, special clarification needed to be provided for: | ||||
o With regard to the handling of FSLI4GF_GOING, it needs to be made | ||||
clear that this only applies to the unavailability of a replica | ||||
rather than to a path to access a replica. | ||||
o In describing the appropriate value for a server to use for | ||||
fli_valid_for, it needs to be made clear that there is no need for | ||||
the client to frequently fetch the fs_locations_info value to be | ||||
prepared for shifts in trunking patterns. | ||||
o Clarification of the rules for extensions of the fls_info needs to | ||||
be provided. The existing treatment reflects the extension model | ||||
in effect at the time [RFC5661] was written, and need to be | ||||
updated in accord with the extension model described [RFC8178]. | ||||
12.2. The Attribute fs_locations_info (as updated) | ||||
The fs_locations_info attribute is intended as a more functional | ||||
replacement for the fs_locations attribute which will continue to | ||||
exist and be supported. Clients can use it to get a more complete | ||||
set of data about alternative file system locations, including | ||||
additional network paths to access replicas in use and additional | ||||
replicas. When the server does not support fs_locations_info, | ||||
fs_locations can be used to get a subset of the data. A server that | ||||
supports fs_locations_info MUST support fs_locations as well. | ||||
There is additional data present in fs_locations_info, that is not | ||||
available in fs_locations: | ||||
o Attribute continuity information. This information will allow a | ||||
client to select a replica that meets the transparency | ||||
requirements of the applications accessing the data and to | ||||
leverage optimizations due to the server guarantees of attribute | ||||
continuity (e.g., if the change attribute of a file of the file | ||||
system is continuous between multiple replicas, the client does | ||||
not have to invalidate the file's cache when switching to a | ||||
different replica). | ||||
o File system identity information that indicates when multiple | ||||
replicas, from the client's point of view, correspond to the same | ||||
target file system, allowing them to be used interchangeably, | ||||
without disruption, as distinct synchronized replicas of the same | ||||
file data. | ||||
Note that having two replicas with common identity information is | ||||
distinct from the case of two (trunked) paths to the same replica. | ||||
o Information that will bear on the suitability of various replicas, | ||||
depending on the use that the client intends. For example, many | ||||
applications need an absolutely up-to-date copy (e.g., those that | ||||
write), while others may only need access to the most up-to-date | ||||
copy reasonably available. | ||||
o Server-derived preference information for replicas, which can be | ||||
used to implement load-balancing while giving the client the | ||||
entire file system list to be used in case the primary fails. | ||||
The fs_locations_info attribute is structured similarly to the | ||||
fs_locations attribute. A top-level structure (fs_locations_info4) | ||||
contains the entire attribute including the root pathname of the file | ||||
system and an array of lower-level structures that define replicas | ||||
that share a common rootpath on their respective servers. The lower- | ||||
level structure in turn (fs_locations_item4) contains a specific | ||||
pathname and information on one or more individual network access | ||||
paths. For that last lowest level, fs_locations_info has an | ||||
fs_locations_server4 structure that contains per-server-replica | ||||
information in addition to the location entry. This per-server- | ||||
replica information includes a nominally opaque array, fls_info, | ||||
within which specific pieces of information are located at the | ||||
specific indices listed below. | ||||
Two fs_location_server4 entries that are within different | ||||
fs_location_item4 structures are never trunkable, while two entries | ||||
within in the same fs_location_item4 structure might or might not be | ||||
trunkable. Two entries that are trunkable will have identical | ||||
identity information, although, as noted above, the converse is not | ||||
the case. | ||||
The attribute will always contain at least a single | ||||
fs_locations_server entry. Typically, there will be an entries with | ||||
the FS4LIGF_CUR_REQ flag set, although in the case of a referral | ||||
there will be no entry with that flag set. | ||||
It should be noted that fs_locations_info attributes returned by | ||||
servers for various replicas may differ for various reasons. One | ||||
server may know about a set of replicas that are not known to other | ||||
servers. Further, compatibility attributes may differ. Filehandles | ||||
might be of the same class going from replica A to replica B but not | ||||
going in the reverse direction. This might happen because the | ||||
filehandles are the same, but replica B's server implementation might | ||||
not have provision to note and report that equivalence. | ||||
The fs_locations_info attribute consists of a root pathname | ||||
(fli_fs_root, just like fs_root in the fs_locations attribute), | ||||
together with an array of fs_location_item4 structures. The | ||||
fs_location_item4 structures in turn consist of a root pathname | ||||
(fli_rootpath) together with an array (fli_entries) of elements of | ||||
data type fs_locations_server4, all defined as follows. | ||||
<CODE BEGINS> | ||||
/* | ||||
* Defines an individual server access path | ||||
*/ | ||||
struct fs_locations_server4 { | ||||
int32_t fls_currency; | ||||
opaque fls_info<>; | ||||
utf8str_cis fls_server; | ||||
}; | ||||
/* | ||||
* Byte indices of items within | ||||
* fls_info: flag fields, class numbers, | ||||
* bytes indicating ranks and orders. | ||||
*/ | ||||
const FSLI4BX_GFLAGS = 0; | ||||
const FSLI4BX_TFLAGS = 1; | ||||
const FSLI4BX_CLSIMUL = 2; | ||||
const FSLI4BX_CLHANDLE = 3; | ||||
const FSLI4BX_CLFILEID = 4; | ||||
const FSLI4BX_CLWRITEVER = 5; | ||||
const FSLI4BX_CLCHANGE = 6; | ||||
const FSLI4BX_CLREADDIR = 7; | ||||
const FSLI4BX_READRANK = 8; | ||||
const FSLI4BX_WRITERANK = 9; | ||||
const FSLI4BX_READORDER = 10; | ||||
const FSLI4BX_WRITEORDER = 11; | ||||
/* | ||||
* Bits defined within the general flag byte. | ||||
*/ | ||||
const FSLI4GF_WRITABLE = 0x01; | ||||
const FSLI4GF_CUR_REQ = 0x02; | ||||
const FSLI4GF_ABSENT = 0x04; | ||||
const FSLI4GF_GOING = 0x08; | ||||
const FSLI4GF_SPLIT = 0x10; | ||||
/* | ||||
* Bits defined within the transport flag byte. | ||||
*/ | ||||
const FSLI4TF_RDMA = 0x01; | ||||
/* | ||||
* Defines a set of replicas sharing | ||||
* a common value of the rootpath | ||||
* within the corresponding | ||||
* single-server namespaces. | ||||
*/ | ||||
struct fs_locations_item4 { | ||||
fs_locations_server4 fli_entries<>; | ||||
pathname4 fli_rootpath; | ||||
}; | ||||
/* | ||||
* Defines the overall structure of | ||||
* the fs_locations_info attribute. | ||||
*/ | ||||
struct fs_locations_info4 { | ||||
uint32_t fli_flags; | ||||
int32_t fli_valid_for; | ||||
pathname4 fli_fs_root; | ||||
fs_locations_item4 fli_items<>; | ||||
}; | ||||
/* | ||||
* Flag bits in fli_flags. | ||||
*/ | ||||
const FSLI4IF_VAR_SUB = 0x00000001; | ||||
typedef fs_locations_info4 fattr4_fs_locations_info; | ||||
<CODE ENDS> | ||||
As noted above, the fs_locations_info attribute, when supported, may | ||||
be requested of absent file systems without causing NFS4ERR_MOVED to | ||||
be returned. It is generally expected that it will be available for | ||||
both present and absent file systems even if only a single | ||||
fs_locations_server4 entry is present, designating the current | ||||
(present) file system, or two fs_locations_server4 entries | ||||
designating the previous location of an absent file system (the one | ||||
just referenced) and its successor location. Servers are strongly | ||||
urged to support this attribute on all file systems if they support | ||||
it on any file system. | ||||
The data presented in the fs_locations_info attribute may be obtained | ||||
by the server in any number of ways, including specification by the | ||||
administrator or by current protocols for transferring data among | ||||
replicas and protocols not yet developed. NFSv4.1 only defines how | ||||
this information is presented by the server to the client. | ||||
12.2.1. The fs_locations_server4 Structure (as updated) | ||||
The fs_locations_server4 structure consists of the following items in | ||||
addition to the fls_server field which specifies a network address or | ||||
set of addresses to be used to access the specified file system. | ||||
Note that both of these items specify attributes of the file system | ||||
replica and should not be different when there are multiple | ||||
fs_locations_server4 structures for the same replica, each specifying | ||||
a network path to the chosen replica. | ||||
o An indication of how up-to-date the file system is (fls_currency) | ||||
in seconds. This value is relative to the master copy. A | ||||
negative value indicates that the server is unable to give any | ||||
reasonably useful value here. A value of zero indicates that the | ||||
file system is the actual writable data or a reliably coherent and | ||||
fully up-to-date copy. Positive values indicate how out-of-date | ||||
this copy can normally be before it is considered for update. | ||||
Such a value is not a guarantee that such updates will always be | ||||
performed on the required schedule but instead serves as a hint | ||||
about how far the copy of the data would be expected to be behind | ||||
the most up-to-date copy. | ||||
o A counted array of one-byte values (fls_info) containing | ||||
information about the particular file system instance. This data | ||||
includes general flags, transport capability flags, file system | ||||
equivalence class information, and selection priority information. | ||||
The encoding will be discussed below. | ||||
o The server string (fls_server). For the case of the replica | ||||
currently being accessed (via GETATTR), a zero-length string MAY | ||||
be used to indicate the current address being used for the RPC | ||||
call. The fls_server field can also be an IPv4 or IPv6 address, | ||||
formatted the same way as an IPv4 or IPv6 address in the "server" | ||||
field of the fs_location4 data type (see Section 11.9 of | ||||
[RFC5661]). | ||||
With the exception of the transport-flag field (at offset | ||||
FSLIBX_TFLAGS with the fls_info array), all of this data applies to | ||||
the replica specified by the entry, rather that the specific network | ||||
path used to access it. | ||||
Data within the fls_info array is in the form of 8-bit data items | ||||
with constants giving the offsets within the array of various values | ||||
describing this particular file system instance. This style of | ||||
definition was chosen, in preference to explicit XDR structure | ||||
definitions for these values, for a number of reasons. | ||||
o The kinds of data in the fls_info array, representing flags, file | ||||
system classes, and priorities among sets of file systems | ||||
representing the same data, are such that 8 bits provide a quite | ||||
acceptable range of values. Even where there might be more than | ||||
256 such file system instances, having more than 256 distinct | ||||
classes or priorities is unlikely. | ||||
o Explicit definition of the various specific data items within XDR | ||||
would limit expandability in that any extension within would | ||||
require yet another attribute, leading to specification and | ||||
implementation clumsiness. In the context of the NFSv4 extension | ||||
model in effect at the time fs_locations_info was designed (i.e. | ||||
that described in [RFC5661]), this would necessitate a new minor | ||||
to effect any Standards Track extension to the data in in | ||||
fls_info. | ||||
The set of fls_info data is subject to expansion in a future minor | ||||
version, or in a Standards Track RFC, within the context of a single | ||||
minor version. The server SHOULD NOT send and the client MUST NOT | ||||
use indices within the fls_info array or flag bits that are not | ||||
defined in Standards Track RFCs. | ||||
In light of the new extension model defined in [RFC8178] and the fact | ||||
that the individual items within fls_info are not explicitly | ||||
referenced in the XDR, the following practices should be followed | ||||
when extending or otherwise changing the structure of the data | ||||
returned in fls_info within the scope of a single minor version. | ||||
o All extensions need to be described by Standards Track documents. | ||||
There is no need for such documents to be marked as updating | ||||
[RFC5661] or this document. | ||||
o It needs to be made clear whether the information in any added | ||||
data items applies to the replica specified by the entry or to the | ||||
specific network paths specified in the entry. | ||||
o There needs to be a reliable way defined to determine whether the | ||||
server is aware of the extension. This may be based on the length | ||||
field of the fls_info array, but it is more flexible to provide | ||||
fs-scope or server-scope attributes to indicate what extensions | ||||
are provided. | ||||
This encoding scheme can be adapted to the specification of multi- | ||||
byte numeric values, even though none are currently defined. If | ||||
extensions are made via Standards Track RFCs, multi-byte quantities | ||||
will be encoded as a range of bytes with a range of indices, with the | ||||
byte interpreted in big-endian byte order. Further, any such index | ||||
assignments will be constrained by the need for the relevant | ||||
quantities not to cross XDR word boundaries. | ||||
The fls_info array currently contains: | ||||
o Two 8-bit flag fields, one devoted to general file-system | ||||
characteristics and a second reserved for transport-related | ||||
capabilities. | ||||
o Six 8-bit class values that define various file system equivalence | ||||
classes as explained below. | ||||
o Four 8-bit priority values that govern file system selection as | ||||
explained below. | ||||
The general file system characteristics flag (at byte index | ||||
FSLI4BX_GFLAGS) has the following bits defined within it: | ||||
o FSLI4GF_WRITABLE indicates that this file system target is | ||||
writable, allowing it to be selected by clients that may need to | ||||
write on this file system. When the current file system instance | ||||
is writable and is defined as of the same simultaneous use class | ||||
(as specified by the value at index FSLI4BX_CLSIMUL) to which the | ||||
client was previously writing, then it must incorporate within its | ||||
data any committed write made on the source file system instance. | ||||
See Section 8.6, which discusses the write-verifier class. While | ||||
there is no harm in not setting this flag for a file system that | ||||
turns out to be writable, turning the flag on for a read-only file | ||||
system can cause problems for clients that select a migration or | ||||
replication target based on the flag and then find themselves | ||||
unable to write. | ||||
o FSLI4GF_CUR_REQ indicates that this replica is the one on which | ||||
the request is being made. Only a single server entry may have | ||||
this flag set and, in the case of a referral, no entry will have | ||||
it set. Note that this flag might be set even if the request was | ||||
made on a network access path different from any of those | ||||
specified in the current entry. | ||||
o FSLI4GF_ABSENT indicates that this entry corresponds to an absent | ||||
file system replica. It can only be set if FSLI4GF_CUR_REQ is | ||||
set. When both such bits are set, it indicates that a file system | ||||
instance is not usable but that the information in the entry can | ||||
be used to determine the sorts of continuity available when | ||||
switching from this replica to other possible replicas. Since | ||||
this bit can only be true if FSLI4GF_CUR_REQ is true, the value | ||||
could be determined using the fs_status attribute, but the | ||||
information is also made available here for the convenience of the | ||||
client. An entry with this bit, since it represents a true file | ||||
system (albeit absent), does not appear in the event of a | ||||
referral, but only when a file system has been accessed at this | ||||
location and has subsequently been migrated. | ||||
o FSLI4GF_GOING indicates that a replica, while still available, | ||||
should not be used further. The client, if using it, should make | ||||
an orderly transfer to another file system instance as | ||||
expeditiously as possible. It is expected that file systems going | ||||
out of service will be announced as FSLI4GF_GOING some time before | ||||
the actual loss of service. It is also expected that the | ||||
fli_valid_for value will be sufficiently small to allow clients to | ||||
detect and act on scheduled events, while large enough that the | ||||
cost of the requests to fetch the fs_locations_info values will | ||||
not be excessive. Values on the order of ten minutes seem | ||||
reasonable. | ||||
When this flag is seen as part of a transition into a new file | ||||
system, a client might choose to transfer immediately to another | ||||
replica, or it may reference the current file system and only | ||||
transition when a migration event occurs. Similarly, when this | ||||
flag appears as a replica in the referral, clients would likely | ||||
avoid being referred to this instance whenever there is another | ||||
choice. | ||||
This flag, like the other items within fls_info applies to the | ||||
replica, rather than to a particular path to that replica. When | ||||
it appears, a transition to a new replica rather than to a | ||||
different path to the same replica, is indicated. | ||||
o FSLI4GF_SPLIT indicates that when a transition occurs from the | ||||
current file system instance to this one, the replacement may | ||||
consist of multiple file systems. In this case, the client has to | ||||
be prepared for the possibility that objects on the same file | ||||
system before migration will be on different ones after. Note | ||||
that FSLI4GF_SPLIT is not incompatible with the file systems | ||||
belonging to the same fileid class since, if one has a set of | ||||
fileids that are unique within a file system, each subset assigned | ||||
to a smaller file system after migration would not have any | ||||
conflicts internal to that file system. | ||||
A client, in the case of a split file system, will interrogate | ||||
existing files with which it has continuing connection (it is free | ||||
to simply forget cached filehandles). If the client remembers the | ||||
directory filehandle associated with each open file, it may | ||||
proceed upward using LOOKUPP to find the new file system | ||||
boundaries. Note that in the event of a referral, there will not | ||||
be any such files and so these actions will not be performed. | ||||
Instead, a reference to a portion of the original file system now | ||||
split off into other file systems will encounter an fsid change | ||||
and possibly a further referral. | ||||
Once the client recognizes that one file system has been split | ||||
into two, it can prevent the disruption of running applications by | ||||
presenting the two file systems as a single one until a convenient | ||||
point to recognize the transition, such as a restart. This would | ||||
require a mapping from the server's fsids to fsids as seen by the | ||||
client, but this is already necessary for other reasons. As noted | ||||
above, existing fileids within the two descendant file systems | ||||
will not conflict. Providing non-conflicting fileids for newly | ||||
created files on the split file systems is the responsibility of | ||||
the server (or servers working in concert). The server can encode | ||||
filehandles such that filehandles generated before the split event | ||||
can be discerned from those generated after the split, allowing | ||||
the server to determine when the need for emulating two file | ||||
systems as one is over. | ||||
Although it is possible for this flag to be present in the event | ||||
of referral, it would generally be of little interest to the | ||||
client, since the client is not expected to have information | ||||
regarding the current contents of the absent file system. | ||||
The transport-flag field (at byte index FSLI4BX_TFLAGS) contains the | ||||
following bits related to the transport capabilities of the specific | ||||
network path(s) specified by the entry. | ||||
o FSLI4TF_RDMA indicates that any specified network paths provide | ||||
NFSv4.1 clients access using an RDMA-capable transport. | ||||
Attribute continuity and file system identity information are | ||||
expressed by defining equivalence relations on the sets of file | ||||
systems presented to the client. Each such relation is expressed as | ||||
a set of file system equivalence classes. For each relation, a file | ||||
system has an 8-bit class number. Two file systems belong to the | ||||
same class if both have identical non-zero class numbers. Zero is | ||||
treated as non-matching. Most often, the relevant question for the | ||||
client will be whether a given replica is identical to / continuous | ||||
with the current one in a given respect, but the information should | ||||
be available also as to whether two other replicas match in that | ||||
respect as well. | ||||
The following fields specify the file system's class numbers for the | ||||
equivalence relations used in determining the nature of file system | ||||
transitions. See Sections 6 through 11 and their various subsections | ||||
for details about how this information is to be used. Servers may | ||||
assign these values as they wish, so long as file system instances | ||||
that share the same value have the specified relationship to one | ||||
another; conversely, file systems that have the specified | ||||
relationship to one another share a common class value. As each | ||||
instance entry is added, the relationships of this instance to | ||||
previously entered instances can be consulted, and if one is found | ||||
that bears the specified relationship, that entry's class value can | ||||
be copied to the new entry. When no such previous entry exists, a | ||||
new value for that byte index (not previously used) can be selected, | ||||
most likely by incrementing the value of the last class value | ||||
assigned for that index. | ||||
o The field with byte index FSLI4BX_CLSIMUL defines the | ||||
simultaneous-use class for the file system. | ||||
o The field with byte index FSLI4BX_CLHANDLE defines the handle | ||||
class for the file system. | ||||
o The field with byte index FSLI4BX_CLFILEID defines the fileid | ||||
class for the file system. | ||||
o The field with byte index FSLI4BX_CLWRITEVER defines the write- | ||||
verifier class for the file system. | ||||
o The field with byte index FSLI4BX_CLCHANGE defines the change | ||||
class for the file system. | ||||
o The field with byte index FSLI4BX_CLREADDIR defines the readdir | ||||
class for the file system. | ||||
Server-specified preference information is also provided via 8-bit | ||||
values within the fls_info array. The values provide a rank and an | ||||
order (see below) to be used with separate values specifiable for the | ||||
cases of read-only and writable file systems. These values are | ||||
compared for different file systems to establish the server-specified | ||||
preference, with lower values indicating "more preferred". | ||||
Rank is used to express a strict server-imposed ordering on clients, | ||||
with lower values indicating "more preferred". Clients should | ||||
attempt to use all replicas with a given rank before they use one | ||||
with a higher rank. Only if all of those file systems are | ||||
unavailable should the client proceed to those of a higher rank. | ||||
Because specifying a rank will override client preferences, servers | ||||
should be conservative about using this mechanism, particularly when | ||||
the environment is one in which client communication characteristics | ||||
are neither tightly controlled nor visible to the server. | ||||
Within a rank, the order value is used to specify the server's | ||||
preference to guide the client's selection when the client's own | ||||
preferences are not controlling, with lower values of order | ||||
indicating "more preferred". If replicas are approximately equal in | ||||
all respects, clients should defer to the order specified by the | ||||
server. When clients look at server latency as part of their | ||||
selection, they are free to use this criterion but it is suggested | ||||
that when latency differences are not significant, the server- | ||||
specified order should guide selection. | ||||
o The field at byte index FSLI4BX_READRANK gives the rank value to | ||||
be used for read-only access. | ||||
o The field at byte index FSLI4BX_READORDER gives the order value to | ||||
be used for read-only access. | ||||
o The field at byte index FSLI4BX_WRITERANK gives the rank value to | ||||
be used for writable access. | ||||
o The field at byte index FSLI4BX_WRITEORDER gives the order value | ||||
to be used for writable access. | ||||
Depending on the potential need for write access by a given client, | ||||
one of the pairs of rank and order values is used. The read rank and | ||||
order should only be used if the client knows that only reading will | ||||
ever be done or if it is prepared to switch to a different replica in | ||||
the event that any write access capability is required in the future. | ||||
12.2.2. The fs_locations_info4 Structure (as updated) | ||||
The fs_locations_info4 structure, encoding the fs_locations_info | ||||
attribute, contains the following: | ||||
o The fli_flags field, which contains general flags that affect the | ||||
interpretation of this fs_locations_info4 structure and all | ||||
fs_locations_item4 structures within it. The only flag currently | ||||
defined is FSLI4IF_VAR_SUB. All bits in the fli_flags field that | ||||
are not defined should always be returned as zero. | ||||
o The fli_fs_root field, which contains the pathname of the root of | ||||
the current file system on the current server, just as it does in | ||||
the fs_locations4 structure. | ||||
o An array called fli_items of fs_locations4_item structures, which | ||||
contain information about replicas of the current file system. | ||||
Where the current file system is actually present, or has been | ||||
present, i.e., this is not a referral situation, one of the | ||||
fs_locations_item4 structures will contain an fs_locations_server4 | ||||
for the current server. This structure will have FSLI4GF_ABSENT | ||||
set if the current file system is absent, i.e., normal access to | ||||
it will return NFS4ERR_MOVED. | ||||
o The fli_valid_for field specifies a time in seconds for which it | ||||
is reasonable for a client to use the fs_locations_info attribute | ||||
without refetch. The fli_valid_for value does not provide a | ||||
guarantee of validity since servers can unexpectedly go out of | ||||
service or become inaccessible for any number of reasons. Clients | ||||
are well-advised to refetch this information for an actively | ||||
accessed file system at every fli_valid_for seconds. This is | ||||
particularly important when file system replicas may go out of | ||||
service in a controlled way using the FSLI4GF_GOING flag to | ||||
communicate an ongoing change. The server should set | ||||
fli_valid_for to a value that allows well-behaved clients to | ||||
notice the FSLI4GF_GOING flag and make an orderly switch before | ||||
the loss of service becomes effective. If this value is zero, | ||||
then no refetch interval is appropriate and the client need not | ||||
refetch this data on any particular schedule. In the event of a | ||||
transition to a new file system instance, a new value of the | ||||
fs_locations_info attribute will be fetched at the destination. | ||||
It is to be expected that this may have a different fli_valid_for | ||||
value, which the client should then use in the same fashion as the | ||||
previous value. Because a refetch of the attribute cause | ||||
information from all component entries to be refetched, the server | ||||
will typically provide a low value for this field if any of the | ||||
replicas are likely to go out of service in a short time frame. | ||||
Note that, because of the ability of the server to return | ||||
NFS4ERR_MOVED to change to use of different paths, when alternate | ||||
trunked paths are available, there is generally no need to use low | ||||
values of fli_valid_for in connection with the management of | ||||
alternate paths to the same replica. | ||||
The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable | ||||
substitution is to be enabled. See Section 12.2.3 for an explanation | ||||
of variable substitution. | ||||
12.2.3. The fs_locations_item4 Structure (as updated) | ||||
The fs_locations_item4 structure contains a pathname (in the field | ||||
fli_rootpath) that encodes the path of the target file system | ||||
replicas on the set of servers designated by the included | ||||
fs_locations_server4 entries. The precise manner in which this | ||||
target location is specified depends on the value of the | ||||
FSLI4IF_VAR_SUB flag within the associated fs_locations_info4 | ||||
structure. | ||||
If this flag is not set, then fli_rootpath simply designates the | ||||
location of the target file system within each server's single-server | ||||
namespace just as it does for the rootpath within the fs_location4 | ||||
structure. When this bit is set, however, component entries of a | ||||
certain form are subject to client-specific variable substitution so | ||||
as to allow a degree of namespace non-uniformity in order to | ||||
accommodate the selection of client-specific file system targets to | ||||
adapt to different client architectures or other characteristics. | ||||
When such substitution is in effect, a variable beginning with the | ||||
string "${" and ending with the string "}" and containing a colon is | ||||
to be replaced by the client-specific value associated with that | ||||
variable. The string "unknown" should be used by the client when it | ||||
has no value for such a variable. The pathname resulting from such | ||||
substitutions is used to designate the target file system, so that | ||||
different clients may have different file systems, corresponding to | ||||
that location in the multi-server namespace. | ||||
As mentioned above, such substituted pathname variables contain a | ||||
colon. The part before the colon is to be a DNS domain name, and the | ||||
part after is to be a case-insensitive alphanumeric string. | ||||
Where the domain is "ietf.org", only variable names defined in this | ||||
document or subsequent Standards Track RFCs are subject to such | ||||
substitution. Organizations are free to use their domain names to | ||||
create their own sets of client-specific variables, to be subject to | ||||
such substitution. In cases where such variables are intended to be | ||||
used more broadly than a single organization, publication of an | ||||
Informational RFC defining such variables is RECOMMENDED. | ||||
The variable ${ietf.org:CPU_ARCH} is used to denote that the CPU | ||||
architecture object files are compiled. This specification does not | ||||
limit the acceptable values (except that they must be valid UTF-8 | ||||
strings), but such values as "x86", "x86_64", and "sparc" would be | ||||
expected to be used in line with industry practice. | ||||
The variable ${ietf.org:OS_TYPE} is used to denote the operating | ||||
system, and thus the kernel and library APIs, for which code might be | ||||
compiled. This specification does not limit the acceptable values | ||||
(except that they must be valid UTF-8 strings), but such values as | ||||
"linux" and "freebsd" would be expected to be used in line with | ||||
industry practice. | ||||
The variable ${ietf.org:OS_VERSION} is used to denote the operating | ||||
system version, and thus the specific details of versioned | ||||
interfaces, for which code might be compiled. This specification | ||||
does not limit the acceptable values (except that they must be valid | ||||
UTF-8 strings). However, combinations of numbers and letters with | ||||
interspersed dots would be expected to be used in line with industry | ||||
practice, with the details of the version format depending on the | ||||
specific value of the variable ${ietf.org:OS_TYPE} with which it is | ||||
used. | ||||
Use of these variables could result in the direction of different | ||||
clients to different file systems on the same server, as appropriate | ||||
to particular clients. In cases in which the target file systems are | ||||
located on different servers, a single server could serve as a | ||||
referral point so that each valid combination of variable values | ||||
would designate a referral hosted on a single server, with the | ||||
targets of those referrals on a number of different servers. | ||||
Because namespace administration is affected by the values selected | ||||
to substitute for various variables, clients should provide | ||||
convenient means of determining what variable substitutions a client | ||||
will implement, as well as, where appropriate, providing means to | ||||
control the substitutions to be used. The exact means by which this | ||||
will be done is outside the scope of this specification. | ||||
Although variable substitution is most suitable for use in the | ||||
context of referrals, it may be used in the context of replication | ||||
and migration. If it is used in these contexts, the server must | ||||
ensure that no matter what values the client presents for the | ||||
substituted variables, the result is always a valid successor file | ||||
system instance to that from which a transition is occurring, i.e., | ||||
that the data is identical or represents a later image of a writable | ||||
file system. | ||||
Note that when fli_rootpath is a null pathname (that is, one with | ||||
zero components), the file system designated is at the root of the | ||||
specified server, whether or not the FSLI4IF_VAR_SUB flag within the | ||||
associated fs_locations_info4 structure is set. | ||||
13. Changes to RFC5661 outside Section 11 | ||||
Beside the major rework of Section 11, there are a number of related | Beside the major rework of Section 11, there are a number of related | |||
changes that are necessary: | changes that are necessary: | |||
o The summary that appeared in Section 1.7.3.3 of [RFC5661] needs to | o The summary that appeared in Section 1.7.3.3 of [RFC5661] needs to | |||
be revised to reflect the changes called for in Section 4 of the | be revised to reflect the changes called for in Section 4 of the | |||
current document. The updated summary appears as Section 12.1 | current document. The updated summary appears as Section 13.1 | |||
below. | below. | |||
o The discussion of server scope which appeared in Section 2.10.4 of | o The discussion of server scope which appeared in Section 2.10.4 of | |||
[RFC5661] needs to be replaced, since the existing text appears to | [RFC5661] needs to be replaced, since the existing text appears to | |||
require a level of inter-server co-ordination incompatible with | require a level of inter-server co-ordination incompatible with | |||
its basic function of avoiding the need for a globally uniform | its basic function of avoiding the need for a globally uniform | |||
means of assigning server_owner values. A revised treatment | means of assigning server_owner values. A revised treatment | |||
appears Section 12.2 below. | appears in Section 13.2 below. | |||
o While the last paragraph (exclusive of sub-sections) of | o While the last paragraph (exclusive of sub-sections) of | |||
Section 2.10.5 in [RFC5661], dealing with server_owner changes, is | Section 2.10.5 in [RFC5661], dealing with server_owner changes, is | |||
literally true, it has been a source of confusion. Since the | literally true, it has been a source of confusion. Since the | |||
existing paragraph can be read as suggesting that such changes be | existing paragraph can be read as suggesting that such changes be | |||
dealt with non-disruptively, the treatment in Section 12.4 below | dealt with non-disruptively, the treatment in Section 13.4 below | |||
needs to be substituted. | needs to be substituted. | |||
o The existing definition of NFS4ERR_MOVED (in Section 15.1.2.4 of | o The existing definition of NFS4ERR_MOVED (in Section 15.1.2.4 of | |||
[RFC5661]) needs to be updated to reflect the different handling | [RFC5661]) needs to be updated to reflect the different handling | |||
of unavailability of a particular fs via a specific network | of unavailability of a particular fs via a specific network | |||
address. Since such a situation is no longer considered to | address. Since such a situation is no longer considered to | |||
constitute unavailability of a file system instance, the | constitute unavailability of a file system instance, the | |||
description needs to change even though the instances in which it | description needs to change even though the set of circumstances | |||
is returned remain the same. The updated description appears in | in which it is to be returned remain the same. The updated | |||
Section 12.3 below. | description appears in Section 13.3 below. | |||
o The existing treatment of EXCHANGE_ID (in Section 18.35 of | o The existing treatment of EXCHANGE_ID (in Section 18.35 of | |||
[RFC5661]) assumes that client IDs cannot be created/ confirmed | [RFC5661]) assumes that client IDs cannot be created/ confirmed | |||
other than by the EXCHANGE_ID and CREATE_SESSION operations. | other than by the EXCHANGE_ID and CREATE_SESSION operations. | |||
Also, the necessary use of EXCHANGE_ID in recovery from migration | Also, the necessary use of EXCHANGE_ID in recovery from migration | |||
and related situations is not addressed clearly. A revised | and related situations is not addressed clearly. A revised | |||
treatment of EXCHANGE_ID is necessary and it appears in Section 13 | treatment of EXCHANGE_ID is necessary and it appears in Section 14 | |||
below while the specific differences between it and the treatment | below while the specific differences between it and the treatment | |||
within [RFC5661] are explained in Section 12.5 below. | within [RFC5661] are explained in Section 13.5 below. | |||
12.1. (Introduction to) Multi-Server Namespace (as updated) | o The existing treatment of RECLAIM_COMPLETE in section 18.51 of | |||
[RFC5661]) is not sufficiently clear about the purpose and use of | ||||
the rca_one_fs and how the server is to deal with inappropriate | ||||
values of this argument. Because the resulting confusion raises | ||||
interoperability issues, a new treatment of RECLAIM_COMPLETE is | ||||
necessary and it appears in Section 15 below while the specific | ||||
differences between it and the treatment within [RFC5661] are | ||||
discussed in Section 13.6 below. In addition, the definitions of | ||||
the reclaim-related errors receive an updated treatment in | ||||
Section 13.7 to reflect the fact that there are multiple contexts | ||||
for lock reclaim operations. | ||||
13.1. (Introduction to) Multi-Server Namespace (as updated) | ||||
NFSv4.1 contains a number of features to allow implementation of | NFSv4.1 contains a number of features to allow implementation of | |||
namespaces that cross server boundaries and that allow and facilitate | namespaces that cross server boundaries and that allow and facilitate | |||
a non-disruptive transfer of support for individual file systems | a non-disruptive transfer of support for individual file systems | |||
between servers. They are all based upon attributes that allow one | between servers. They are all based upon attributes that allow one | |||
file system to specify alternate, additional, and new location | file system to specify alternate, additional, and new location | |||
information which specifies how the client may access to access that | information which specifies how the client may access to access that | |||
file system. | file system. | |||
These attributes can be used to provide for individual active file | These attributes can be used to provide for individual active file | |||
skipping to change at page 45, line 42 ¶ | skipping to change at page 62, line 32 ¶ | |||
o Alternate network addresses to access the current file system | o Alternate network addresses to access the current file system | |||
instance. | instance. | |||
o The locations of alternate file system instances or replicas to be | o The locations of alternate file system instances or replicas to be | |||
used in the event that the current file system instance becomes | used in the event that the current file system instance becomes | |||
unavailable. | unavailable. | |||
These attributes may be used together with the concept of absent file | These attributes may be used together with the concept of absent file | |||
systems, in which a position in the server namespace is associated | systems, in which a position in the server namespace is associated | |||
with locations on other servers without any file system instance on | with locations on other servers without there being any corresponding | |||
the current server. | file system instance on the current server. | |||
o Location attributes may be used with absent file systems to | o Location attributes may be used with absent file systems to | |||
implement referrals whereby one server may direct the client to a | implement referrals whereby one server may direct the client to a | |||
file system provided by another server. This allows extensive | file system provided by another server. This allows extensive | |||
multi-server namespaces to be constructed. | multi-server namespaces to be constructed. | |||
o Location attributes may be provided when a previously present file | o Location attributes may be provided when a previously present file | |||
system becomes absent. This allows non-disruptive migration of | system becomes absent. This allows non-disruptive migration of | |||
file systems to alternate servers. | file systems to alternate servers. | |||
12.2. Server Scope (as updated) | 13.2. Server Scope (as updated) | |||
Servers each specify a server scope value in the form of an opaque | Servers each specify a server scope value in the form of an opaque | |||
string eir_server_scope returned as part of the results of an | string eir_server_scope returned as part of the results of an | |||
EXCHANGE_ID operation. The purpose of the server scope is to allow a | EXCHANGE_ID operation. The purpose of the server scope is to allow a | |||
group of servers to indicate to clients that a set of servers sharing | group of servers to indicate to clients that a set of servers sharing | |||
the same server scope value has arranged to use compatible values of | the same server scope value has arranged to use compatible values of | |||
otherwise opaque identifiers. Thus, the identifiers generated by two | otherwise opaque identifiers. Thus, the identifiers generated by two | |||
servers within that set can be assumed compatible so that, in some | servers within that set can be assumed compatible so that, in some | |||
cases, identifiers by one server in that set that set may be | cases, identifiers by one server in that set that set may be | |||
presented to another server of the same scope. | presented to another server of the same scope. | |||
skipping to change at page 47, line 39 ¶ | skipping to change at page 64, line 27 ¶ | |||
first server for the fs_locations or fs_locations_info attribute | first server for the fs_locations or fs_locations_info attribute | |||
with RPCSEC_GSS authentication. It may need to do this in advance | with RPCSEC_GSS authentication. It may need to do this in advance | |||
of the need to verify the common server scope. If the client | of the need to verify the common server scope. If the client | |||
successfully authenticates the reply to GETATTR, and the GETATTR | successfully authenticates the reply to GETATTR, and the GETATTR | |||
request and reply containing the fs_locations or fs_locations_info | request and reply containing the fs_locations or fs_locations_info | |||
attribute refers to the second server, then the equality of server | attribute refers to the second server, then the equality of server | |||
scope is supported. A client may choose to limit the use of this | scope is supported. A client may choose to limit the use of this | |||
form of support to information relevant to the specific file | form of support to information relevant to the specific file | |||
system involved (e.g. a file system being migrated). | system involved (e.g. a file system being migrated). | |||
12.3. Revised Treatment of NFS4ERR_MOVED | 13.3. Revised Treatment of NFS4ERR_MOVED | |||
Because the term "replica" is now used differently, the current | Because of the need to appropriately address trunking-related issues, | |||
description of NFS4ERR_MOVED needs to be changed to the one below. | some uses of the term "replica" in [RFC5661] have become problematic | |||
The new paragraph explicitly recognizes that a different network | since a shift in network access paths was considered to be a shift to | |||
address might be used, while the previous description, misleadingly, | a different replica. As a result, the description of NFS4ERR_MOVED | |||
treated this as a shift between two replicas while only a single file | in [RFC5661] needs to be changed to the one below. The new paragraph | |||
system instance might be involved. | explicitly recognizes that a different network address might be used, | |||
while the previous description, misleadingly, treated this as a shift | ||||
between two replicas while only a single file system instance might | ||||
be involved. | ||||
The file system that contains the current filehandle object is not | The file system that contains the current filehandle object is not | |||
accessible using the address on which the request was made. It | accessible using the address on which the request was made. It | |||
still might be accessible using other addresses server-trunkable | still might be accessible using other addresses server-trunkable | |||
with it or it might not be present at the server. In the latter | with it or it might not be present at the server. In the latter | |||
case, it might have been relocated or migrated to another server, | case, it might have been relocated or migrated to another server, | |||
or it might have never been present. The client may obtain | or it might have never been present. The client may obtain | |||
information regarding access to the file system location by | information regarding access to the file system location by | |||
obtaining the "fs_locations" or "fs_locations_info" attribute for | obtaining the "fs_locations" or "fs_locations_info" attribute for | |||
the current filehandle. For further discussion, refer to | the current filehandle. For further discussion, refer to | |||
Section 11 of [RFC5661], as modified by the current document. | Section 11 of [RFC5661], as modified by the current document. | |||
12.4. Revised Discussion of Server_owner changes | 13.4. Revised Discussion of Server_owner changes | |||
Because of problems with the treatment of such changes, the confusing | Because of likely problems with the treatment of such changes, a | |||
paragraph, which simply says that such changes need to be dealt with, | confusing paragraph which appear at the end of Section 2.5.10 if | |||
is to be replaced by the one below. | [RFC5661], which simply says that such changes need to be dealt with, | |||
is to be replaced by the material below. | ||||
It is always possible that, as a result of various sorts of | It is always possible that, as a result of various sorts of | |||
reconfiguration events, eir_server_scope and eir_server_owner | reconfiguration events, eir_server_scope and eir_server_owner | |||
values may be different on subsequent EXCHANGE_ID requests made to | values may be different on subsequent EXCHANGE_ID requests made to | |||
the same network address. | the same network address. | |||
In most cases such reconfiguration events will be disruptive and | In most cases such reconfiguration events will be disruptive and | |||
indicate that an IP address formerly connected to one server is | indicate that an IP address formerly connected to one server is | |||
now connected to an entirely different one. | now connected to an entirely different one. | |||
skipping to change at page 49, line 5 ¶ | skipping to change at page 65, line 43 ¶ | |||
eir_server_owner.so_major_id changes, the client can use | eir_server_owner.so_major_id changes, the client can use | |||
filehandles it has and attempt reclaims. It may find that | filehandles it has and attempt reclaims. It may find that | |||
these are now stale but if NFS4ERR_STALE is not received, he | these are now stale but if NFS4ERR_STALE is not received, he | |||
can proceed to reclaim his opens. | can proceed to reclaim his opens. | |||
* When eir_server_scope and eir_server_owner.so_major_id remain | * When eir_server_scope and eir_server_owner.so_major_id remain | |||
the same, the client has to use the now-current values of | the same, the client has to use the now-current values of | |||
eir_server_owner.so_minor_id in deciding on appropriate forms | eir_server_owner.so_minor_id in deciding on appropriate forms | |||
of trunking. | of trunking. | |||
12.5. Revision to Treatment of EXCHANGE_ID | 13.5. Revision to Treatment of EXCHANGE_ID | |||
There are a number of issues in the original treatment of EXCHANGE_ID | There are a number of issues in the original treatment of EXCHANGE_ID | |||
(in [RFC5661]) that cause problems for Transparent State Migration | (in [RFC5661]) that cause problems for Transparent State Migration | |||
and for the transfer of access between different network access paths | and for the transfer of access between different network access paths | |||
to the same file system instance. | to the same file system instance. | |||
These issues arise from the fact that this treatment was written: | These issues arise from the fact that this treatment was written: | |||
o assuming that a client ID can only become known to a server by | o Assuming that a client ID can only become known to a server by | |||
having been created by executing an EXCHANGE_ID, with confirmation | having been created by executing an EXCHANGE_ID, with confirmation | |||
of the ID only possible by execution of a CREATE_SESSION. | of the ID only possible by execution of a CREATE_SESSION. | |||
o Considering the interactions between a client and a server only on | o Considering the interactions between a client and a server only on | |||
a single network address | a single network address | |||
As these assumptions have become invalid in the context of | As these assumptions have become invalid in the context of | |||
Transparent State Migration and active use of trunking, the treatment | Transparent State Migration and active use of trunking, the treatment | |||
has been modified in several respects. | has been modified in several respects. | |||
o It had been assumed that an EXCHANGED_ID executed when the server | o It had been assumed that an EXCHANGED_ID executed when the server | |||
is already aware of a given client instance must be either | is already aware of a given client instance must be either | |||
updating associated parameters (e.g. with respect to callbacks) or | updating associated parameters (e.g. with respect to callbacks) or | |||
a lingering retransmission to deal with a previously lost reply. | a lingering retransmission to deal with a previously lost reply. | |||
As result, any slot sequence returned would be of no use. The | As result, any slot sequence returned by that operation would be | |||
existing treatment goes so far as to say that it "MUST NOT" be | of no use. The existing treatment goes so far as to say that it | |||
used, although this usage is not in accord with [RFC2119]. This | "MUST NOT" be used, although this usage is not in accord with | |||
created a difficulty when an EXCHANGE_ID is done after Transparent | [RFC2119]. This created a difficulty when an EXCHANGE_ID is done | |||
State Migration since that slot sequence needs to be used in a | after Transparent State Migration since that slot sequence would | |||
subsequent CREATE_SESSION. | need to be used in a subsequent CREATE_SESSION. | |||
In the updated treatment, CREATE_SESSION is a way that client IDs | In the updated treatment, CREATE_SESSION is a way that client IDs | |||
are confirmed but it is understood that other ways are possible. | are confirmed but it is understood that other ways are possible. | |||
The slot sequence can be used as needed and cases in which it | The slot sequence can be used as needed and cases in which it | |||
would be of no use are appropriately noted. | would be of no use are appropriately noted. | |||
o It was assumed that the only functions of EXCHANGE_ID were to | o It was assumed that the only functions of EXCHANGE_ID were to | |||
inform the server of the client, create the client ID, and | inform the server of the client, create the client ID, and | |||
communicate it to the client. When multiple simultaneous | communicate it to the client. When multiple simultaneous | |||
connections are involved, as often happens when trunking, that | connections are involved, as often happens when trunking, that | |||
skipping to change at page 50, line 7 ¶ | skipping to change at page 66, line 47 ¶ | |||
EXCHANGE_ID in associating the client ID with the connection on | EXCHANGE_ID in associating the client ID with the connection on | |||
which it was done, so that it could be used by a subsequent | which it was done, so that it could be used by a subsequent | |||
CREATE_SESSSION, whose parameters do not include an explicit | CREATE_SESSSION, whose parameters do not include an explicit | |||
client ID. | client ID. | |||
The new treatment explicitly discusses the role of EXCHANGE_ID in | The new treatment explicitly discusses the role of EXCHANGE_ID in | |||
associating the client ID with the connection so it can be used by | associating the client ID with the connection so it can be used by | |||
CREATE_SESSION and in associating a connection with an existing | CREATE_SESSION and in associating a connection with an existing | |||
session. | session. | |||
The new treatment can be found in Section 13 below. It is intended | The new treatment can be found in Section 14 below. It is intended | |||
to supersede the treatment in Section 18.35 of [RFC5661]. Publishing | to supersede the treatment in Section 18.35 of [RFC5661]. Publishing | |||
a complete replacement for Section 18.35 allows the corrected | a complete replacement for Section 18.35 allows the corrected | |||
definition to be read as a whole once [RFC5661] is updated | definition to be read as a whole once [RFC5661] is updated | |||
13. Operation 42: EXCHANGE_ID - Instantiate Client ID (as updated) | 13.6. Revision to Treatment of RECLAIM_COMPLETE | |||
The following changes were made to the treatment of RECLAIM_COMPLETE | ||||
in [RFC5661] to arrive at the treatment in Section 15. | ||||
o In a number of places the text is more explicit about the purpose | ||||
of rca_one_fs and its connection to file system migration. | ||||
o There is a discussion of situations in which either form of | ||||
RECLAIM_COMPLETE would need to be done. | ||||
o There is a discussion of interoperability issues that result from | ||||
implementations that may have arisen due to the lack of clarity of | ||||
the previous treatment of RECLAIM_COMPLETE. | ||||
13.7. Reclaim Errors (as updated) | ||||
These errors relate to the process of reclaiming locks after a server | ||||
restart or in connection with the migration of a file system (i.e. in | ||||
the case in which rca_one_fs is TRUE). | ||||
13.7.1. NFS4ERR_COMPLETE_ALREADY (as updated; Error Code 10054) | ||||
The client previously sent a successful RECLAIM_COMPLETE operation | ||||
specifying the same scope, whether that scope is global or for the | ||||
same file system in the case of a per-fs RECLAIM_COMPLETE. An | ||||
additional RECLAIM_COMPLETE operation is not necessary and results in | ||||
this error. | ||||
13.7.2. NFS4ERR_GRACE (as updated; Error Code 10013) | ||||
The server was in its recovery or grace period, with regard to the | ||||
file system object for which the lock was requested. The locking | ||||
request was not a reclaim request and so could not be granted during | ||||
that period. | ||||
13.7.3. NFS4ERR_NO_GRACE (as updated; Error Code 10033) | ||||
A reclaim of client state was attempted in circumstances in which the | ||||
server cannot guarantee that conflicting state has not been provided | ||||
to another client. This can occur because the reclaim has been done | ||||
outside of a grace period of implemented by the server, after the | ||||
client has done a RECLAIM_COMPLETE operation which ends its ability | ||||
to reclaim the requested lock, or because previous operations have | ||||
created a situation in which the server is not able to determine that | ||||
a reclaim-interfering edge condition does not exist. | ||||
13.7.4. NFS4ERR_RECLAIM_BAD (as updated; Error Code 10034) | ||||
The server has determined that a reclaim attempted by the client is | ||||
not valid, i.e. the lock specified as being reclaimed could not | ||||
possibly have existed before the server restart or file system | ||||
migration event. A server is not obliged to make this determination | ||||
and will typically rely on the client to only reclaim locks that the | ||||
client was granted prior to restart or file system migration. | ||||
However, when a server does have reliable information to enable it | ||||
make this determination, this error indicates that the reclaim has | ||||
been rejected as invalid. This is as opposed to the error | ||||
NFS4ERR_RECLAIM_CONFLICT (see Section 13.7.5) where the server can | ||||
only determine that there has been an invalid reclaim, but cannot | ||||
determine which request is invalid. | ||||
13.7.5. NFS4ERR_RECLAIM_CONFLICT (as updated; Error Code 10035) | ||||
The reclaim attempted by the client has encountered a conflict and | ||||
cannot be satisfied. Potentially indicates a misbehaving client, | ||||
although not necessarily the one receiving the error. The | ||||
misbehavior might be on the part of the client that established the | ||||
lock with which this client conflicted. See also Section 13.7.4 for | ||||
the related error, NFS4ERR_RECLAIM_BAD. | ||||
14. Operation 42: EXCHANGE_ID - Instantiate Client ID (as updated) | ||||
The EXCHANGE_ID exchanges long-hand client and server identifiers | The EXCHANGE_ID exchanges long-hand client and server identifiers | |||
(owners), and provides access to a client ID, creating one if | (owners), and provides access to a client ID, creating one if | |||
necessary. This client ID becomes associated with the connection on | necessary. This client ID becomes associated with the connection on | |||
which the operation is done, so that it is available when a | which the operation is done, so that it is available when a | |||
CREATE_SESSION is done or when the connection is used to issue a | CREATE_SESSION is done or when the connection is used to issue a | |||
request on an existing session associated with the current client. | request on an existing session associated with the current client. | |||
13.1. ARGUMENT | 14.1. ARGUMENT | |||
<CODE BEGINS> | ||||
const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; | const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; | |||
const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; | const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; | |||
const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; | const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; | |||
const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; | const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; | |||
const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; | const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; | |||
const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; | const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; | |||
skipping to change at page 51, line 24 ¶ | skipping to change at page 69, line 42 ¶ | |||
ssv_sp_parms4 spa_ssv_parms; | ssv_sp_parms4 spa_ssv_parms; | |||
}; | }; | |||
struct EXCHANGE_ID4args { | struct EXCHANGE_ID4args { | |||
client_owner4 eia_clientowner; | client_owner4 eia_clientowner; | |||
uint32_t eia_flags; | uint32_t eia_flags; | |||
state_protect4_a eia_state_protect; | state_protect4_a eia_state_protect; | |||
nfs_impl_id4 eia_client_impl_id<1>; | nfs_impl_id4 eia_client_impl_id<1>; | |||
}; | }; | |||
13.2. RESULT | <CODE ENDS> | |||
14.2. RESULT | ||||
<CODE BEGINS> | ||||
struct ssv_prot_info4 { | struct ssv_prot_info4 { | |||
state_protect_ops4 spi_ops; | state_protect_ops4 spi_ops; | |||
uint32_t spi_hash_alg; | uint32_t spi_hash_alg; | |||
uint32_t spi_encr_alg; | uint32_t spi_encr_alg; | |||
uint32_t spi_ssv_len; | uint32_t spi_ssv_len; | |||
uint32_t spi_window; | uint32_t spi_window; | |||
gsshandle4_t spi_handles<>; | gsshandle4_t spi_handles<>; | |||
}; | }; | |||
union state_protect4_r switch(state_protect_how4 spr_how) { | union state_protect4_r switch(state_protect_how4 spr_how) { | |||
skipping to change at page 52, line 40 ¶ | skipping to change at page 70, line 42 ¶ | |||
}; | }; | |||
union EXCHANGE_ID4res switch (nfsstat4 eir_status) { | union EXCHANGE_ID4res switch (nfsstat4 eir_status) { | |||
case NFS4_OK: | case NFS4_OK: | |||
EXCHANGE_ID4resok eir_resok4; | EXCHANGE_ID4resok eir_resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
13.3. DESCRIPTION | <CODE ENDS> | |||
14.3. DESCRIPTION | ||||
The client uses the EXCHANGE_ID operation to register a particular | The client uses the EXCHANGE_ID operation to register a particular | |||
client_owner with the server. However, when the client_owner has | client_owner with the server. However, when the client_owner has | |||
been already been registered by other means (e.g. Transparent State | been already been registered by other means (e.g. Transparent State | |||
Migration), the client may still use EXCHANGE_ID to obtain the client | Migration), the client may still use EXCHANGE_ID to obtain the client | |||
ID assigned previously. | ID assigned previously. | |||
The client ID returned from this operation will be associated with | The client ID returned from this operation will be associated with | |||
the connection on which the EXHANGE_ID is received and will serve as | the connection on which the EXHANGE_ID is received and will serve as | |||
a parent object for sessions created by the client on this connection | a parent object for sessions created by the client on this connection | |||
skipping to change at page 53, line 41 ¶ | skipping to change at page 71, line 46 ¶ | |||
The eia_clientowner field is composed of a co_verifier field and a | The eia_clientowner field is composed of a co_verifier field and a | |||
co_ownerid string. As noted in section 2.4 of [RFC5661], the | co_ownerid string. As noted in section 2.4 of [RFC5661], the | |||
co_ownerid describes the client, and the co_verifier is the | co_ownerid describes the client, and the co_verifier is the | |||
incarnation of the client. An EXCHANGE_ID sent with a new | incarnation of the client. An EXCHANGE_ID sent with a new | |||
incarnation of the client will lead to the server removing lock state | incarnation of the client will lead to the server removing lock state | |||
of the old incarnation. Whereas an EXCHANGE_ID sent with the current | of the old incarnation. Whereas an EXCHANGE_ID sent with the current | |||
incarnation and co_ownerid will result in an error or an update of | incarnation and co_ownerid will result in an error or an update of | |||
the client ID's properties, depending on the arguments to | the client ID's properties, depending on the arguments to | |||
EXCHANGE_ID. | EXCHANGE_ID. | |||
A server MUST NOT use the same client ID for two different | A server MUST NOT provide the same client ID to two different | |||
incarnations of an eir_clientowner. | incarnations of an eir_clientowner. | |||
In addition to the client ID and sequence ID, the server returns a | In addition to the client ID and sequence ID, the server returns a | |||
server owner (eir_server_owner) and server scope (eir_server_scope). | server owner (eir_server_owner) and server scope (eir_server_scope). | |||
The former field is used for network trunking as described in | The former field is used in connection with network trunking as | |||
Section 2.10.54 of [RFC5661]. The latter field is used to allow | described in Section 2.10.54 of [RFC5661]. The latter field is used | |||
clients to determine when client IDs sent by one server may be | to allow clients to determine when client IDs sent by one server may | |||
recognized by another in the event of file system migration (see | be recognized by another in the event of file system migration (see | |||
Section 8.9 of the current document). | Section 8.9 of the current document). | |||
The client ID returned by EXCHANGE_ID is only unique relative to the | The client ID returned by EXCHANGE_ID is only unique relative to the | |||
combination of eir_server_owner.so_major_id and eir_server_scope. | combination of eir_server_owner.so_major_id and eir_server_scope. | |||
Thus, if two servers return the same client ID, the onus is on the | Thus, if two servers return the same client ID, the onus is on the | |||
client to distinguish the client IDs on the basis of | client to distinguish the client IDs on the basis of | |||
eir_server_owner.so_major_id and eir_server_scope. In the event two | eir_server_owner.so_major_id and eir_server_scope. In the event two | |||
different servers claim matching server_owner.so_major_id and | different servers claim matching server_owner.so_major_id and | |||
eir_server_scope, the client can use the verification techniques | eir_server_scope, the client can use the verification techniques | |||
discussed in Section 2.10.5 of [RFC5661] to determine if the servers | discussed in Section 2.10.5 of [RFC5661] to determine if the servers | |||
skipping to change at page 54, line 39 ¶ | skipping to change at page 72, line 42 ¶ | |||
* EXCHGID4_FLAG_SUPP_MOVED_MIGR | * EXCHGID4_FLAG_SUPP_MOVED_MIGR | |||
* EXCHGID4_FLAG_BIND_PRINC_STATEID | * EXCHGID4_FLAG_BIND_PRINC_STATEID | |||
* EXCHGID4_FLAG_USE_NON_PNFS | * EXCHGID4_FLAG_USE_NON_PNFS | |||
* EXCHGID4_FLAG_USE_PNFS_MDS | * EXCHGID4_FLAG_USE_PNFS_MDS | |||
* EXCHGID4_FLAG_USE_PNFS_DS | * EXCHGID4_FLAG_USE_PNFS_DS | |||
These properties may be updated by subsequent EXCHANGE_ID requests | These properties may be updated by subsequent EXCHANGE_ID | |||
on confirmed client IDs though the server MAY refuse to change | operations on confirmed client IDs though the server MAY refuse to | |||
them. | change them. | |||
o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, | o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, | |||
or SP4_SSV, as set by the spa_how field of the arguments to | or SP4_SSV, as set by the spa_how field of the arguments to | |||
EXCHANGE_ID. Once the client ID is confirmed, this property | EXCHANGE_ID. Once the client ID is confirmed, this property | |||
cannot be updated by subsequent EXCHANGE_ID requests. | cannot be updated by subsequent EXCHANGE_ID operations. | |||
o For SP4_MACH_CRED or SP4_SSV state protection: | o For SP4_MACH_CRED or SP4_SSV state protection: | |||
* The list of operations (spo_must_enforce) that MUST use the | * The list of operations (spo_must_enforce) that MUST use the | |||
specified state protection. This list comes from the results | specified state protection. This list comes from the results | |||
of EXCHANGE_ID. | of EXCHANGE_ID. | |||
* The list of operations (spo_must_allow) that MAY use the | * The list of operations (spo_must_allow) that MAY use the | |||
specified state protection. This list comes from the results | specified state protection. This list comes from the results | |||
of EXCHANGE_ID. | of EXCHANGE_ID. | |||
skipping to change at page 55, line 28 ¶ | skipping to change at page 73, line 32 ¶ | |||
* The OID of the encryption algorithm. This property is | * The OID of the encryption algorithm. This property is | |||
represented by one of the algorithms in the ssp_encr_algs field | represented by one of the algorithms in the ssp_encr_algs field | |||
of the EXCHANGE_ID arguments. Once the client ID is confirmed, | of the EXCHANGE_ID arguments. Once the client ID is confirmed, | |||
this property cannot be updated by subsequent EXCHANGE_ID | this property cannot be updated by subsequent EXCHANGE_ID | |||
requests. | requests. | |||
* The length of the SSV. This property is represented by the | * The length of the SSV. This property is represented by the | |||
spi_ssv_len field in the EXCHANGE_ID results. Once the client | spi_ssv_len field in the EXCHANGE_ID results. Once the client | |||
ID is confirmed, this property cannot be updated by subsequent | ID is confirmed, this property cannot be updated by subsequent | |||
EXCHANGE_ID requests. | EXCHANGE_ID operations. | |||
There are REQUIRED and RECOMMENDED relationships among the | There are REQUIRED and RECOMMENDED relationships among the | |||
length of the key of the encryption algorithm ("key length"), | length of the key of the encryption algorithm ("key length"), | |||
the length of the output of hash algorithm ("hash length"), and | the length of the output of hash algorithm ("hash length"), and | |||
the length of the SSV ("SSV length"). | the length of the SSV ("SSV length"). | |||
+ key length MUST be <= hash length. This is because the keys | + key length MUST be <= hash length. This is because the keys | |||
used for the encryption algorithm are actually subkeys | used for the encryption algorithm are actually subkeys | |||
derived from the SSV, and the derivation is via the hash | derived from the SSV, and the derivation is via the hash | |||
algorithm. The selection of an encryption algorithm with a | algorithm. The selection of an encryption algorithm with a | |||
skipping to change at page 56, line 14 ¶ | skipping to change at page 74, line 17 ¶ | |||
+ key length SHOULD be >= hash length / 2. This is because | + key length SHOULD be >= hash length / 2. This is because | |||
the subkey derivation is via an HMAC and it is recommended | the subkey derivation is via an HMAC and it is recommended | |||
that if the HMAC has to be truncated, it should not be | that if the HMAC has to be truncated, it should not be | |||
truncated to less than half the hash length (see Section 4 | truncated to less than half the hash length (see Section 4 | |||
of RFC2104 [RFC2104]). | of RFC2104 [RFC2104]). | |||
* Number of concurrent versions of the SSV the client and server | * Number of concurrent versions of the SSV the client and server | |||
will support (see Section 2.10.9 of [RFC5661]). This property | will support (see Section 2.10.9 of [RFC5661]). This property | |||
is represented by spi_window in the EXCHANGE_ID results. The | is represented by spi_window in the EXCHANGE_ID results. The | |||
property may be updated by subsequent EXCHANGE_ID requests. | property may be updated by subsequent EXCHANGE_ID operations. | |||
o The client's implementation ID as represented by the | o The client's implementation ID as represented by the | |||
eia_client_impl_id field of the arguments. The property may be | eia_client_impl_id field of the arguments. The property may be | |||
updated by subsequent EXCHANGE_ID requests. | updated by subsequent EXCHANGE_ID requests. | |||
o The server's implementation ID as represented by the | o The server's implementation ID as represented by the | |||
eir_server_impl_id field of the reply. The property may be | eir_server_impl_id field of the reply. The property may be | |||
updated by replies to subsequent EXCHANGE_ID requests. | updated by replies to subsequent EXCHANGE_ID requests. | |||
The eia_flags passed as part of the arguments and the eir_flags | The eia_flags passed as part of the arguments and the eir_flags | |||
skipping to change at page 56, line 51 ¶ | skipping to change at page 75, line 5 ¶ | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means | |||
that the client is attempting to update properties of an existing | that the client is attempting to update properties of an existing | |||
confirmed client ID (if the client wants to update properties of an | confirmed client ID (if the client wants to update properties of an | |||
unconfirmed client ID, it MUST NOT set | unconfirmed client ID, it MUST NOT set | |||
EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED that | EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED that | |||
the client send the update EXCHANGE_ID operation in the same COMPOUND | the client send the update EXCHANGE_ID operation in the same COMPOUND | |||
as a SEQUENCE so that the EXCHANGE_ID is executed exactly once. | as a SEQUENCE so that the EXCHANGE_ID is executed exactly once. | |||
Whether the client can update the properties of client ID depends on | Whether the client can update the properties of client ID depends on | |||
the state protection it selected when the client ID was created, and | the state protection it selected when the client ID was created, and | |||
the principal and security flavor it uses when sending the | the principal and security flavor it uses when sending the | |||
EXCHANGE_ID request. The situations described in items 6, 7, 8, or 9 | EXCHANGE_ID operation. The situations described in items 6, 7, 8, or | |||
of the second numbered list of Section 13.4 below will apply. Note | 9 of the second numbered list of Section 14.4 below will apply. Note | |||
that if the operation succeeds and returns a client ID that is | that if the operation succeeds and returns a client ID that is | |||
already confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R | already confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R | |||
bit in eir_flags. | bit in eir_flags. | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this | |||
means that the client is trying to establish a new client ID; it is | means that the client is trying to establish a new client ID; it is | |||
attempting to trunk data communication to the server (See | attempting to trunk data communication to the server (See | |||
Section 2.10.5 of [RFC5661]); or it is attempting to update | Section 2.10.5 of [RFC5661]); or it is attempting to update | |||
properties of an unconfirmed client ID. The situations described in | properties of an unconfirmed client ID. The situations described in | |||
items 1, 2, 3, 4, or 5 of the second numbered list of Section 13.4 | items 1, 2, 3, 4, or 5 of the second numbered list of Section 14.4 | |||
below will apply. Note that if the operation succeeds and returns a | below will apply. Note that if the operation succeeds and returns a | |||
client ID that was previously confirmed, the server MUST set the | client ID that was previously confirmed, the server MUST set the | |||
EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. | EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. | |||
When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client | When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client | |||
indicates that it is capable of dealing with an NFS4ERR_MOVED error | indicates that it is capable of dealing with an NFS4ERR_MOVED error | |||
as part of a referral sequence. When this bit is not set, it is | as part of a referral sequence. When this bit is not set, it is | |||
still legal for the server to perform a referral sequence. However, | still legal for the server to perform a referral sequence. However, | |||
a server may use the fact that the client is incapable of correctly | a server may use the fact that the client is incapable of correctly | |||
responding to a referral, by avoiding it for that particular client. | responding to a referral, by avoiding it for that particular client. | |||
skipping to change at page 58, line 33 ¶ | skipping to change at page 76, line 36 ¶ | |||
The spa_how field of the eia_state_protect field specifies how the | The spa_how field of the eia_state_protect field specifies how the | |||
client wants to protect its client, locking, and session states from | client wants to protect its client, locking, and session states from | |||
unauthorized changes (Section 2.10.8.3 of [RFC5661]): | unauthorized changes (Section 2.10.8.3 of [RFC5661]): | |||
o SP4_NONE. The client does not request the NFSv4.1 server to | o SP4_NONE. The client does not request the NFSv4.1 server to | |||
enforce state protection. The NFSv4.1 server MUST NOT enforce | enforce state protection. The NFSv4.1 server MUST NOT enforce | |||
state protection for the returned client ID. | state protection for the returned client ID. | |||
o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST | o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST | |||
send the EXCHANGE_ID request with RPCSEC_GSS as the security | send the EXCHANGE_ID operation with RPCSEC_GSS as the security | |||
flavor, and with a service of RPC_GSS_SVC_INTEGRITY or | flavor, and with a service of RPC_GSS_SVC_INTEGRITY or | |||
RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the | RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the | |||
client wants to use an RPCSEC_GSS-based machine credential to | client wants to use an RPCSEC_GSS-based machine credential to | |||
protect its state. The server MUST note the principal the | protect its state. The server MUST note the principal the | |||
EXCHANGE_ID operation was sent with, and the GSS mechanism used. | EXCHANGE_ID operation was sent with, and the GSS mechanism used. | |||
These notes collectively comprise the machine credential. | These notes collectively comprise the machine credential. | |||
After the client ID is confirmed, as long as the lease associated | After the client ID is confirmed, as long as the lease associated | |||
with the client ID is unexpired, a subsequent EXCHANGE_ID | with the client ID is unexpired, a subsequent EXCHANGE_ID | |||
operation that uses the same eia_clientowner.co_owner as the first | operation that uses the same eia_clientowner.co_owner as the first | |||
EXCHANGE_ID MUST also use the same machine credential as the first | EXCHANGE_ID MUST also use the same machine credential as the first | |||
EXCHANGE_ID. The server returns the same client ID for the | EXCHANGE_ID. The server returns the same client ID for the | |||
subsequent EXCHANGE_ID as that returned from the first | subsequent EXCHANGE_ID as that returned from the first | |||
EXCHANGE_ID. | EXCHANGE_ID. | |||
o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the | o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the | |||
EXCHANGE_ID request with RPCSEC_GSS as the security flavor, and | EXCHANGE_ID operation with RPCSEC_GSS as the security flavor, and | |||
with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. | with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. | |||
If SP4_SSV is specified, then the client wants to use the SSV to | If SP4_SSV is specified, then the client wants to use the SSV to | |||
protect its state. The server records the credential used in the | protect its state. The server records the credential used in the | |||
request as the machine credential (as defined above) for the | request as the machine credential (as defined above) for the | |||
eia_clientowner.co_owner. The CREATE_SESSION operation that | eia_clientowner.co_owner. The CREATE_SESSION operation that | |||
confirms the client ID MUST use the same machine credential. | confirms the client ID MUST use the same machine credential. | |||
When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides | When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides | |||
two lists of operations (each expressed as a bitmap). The first list | two lists of operations (each expressed as a bitmap). The first list | |||
is spo_must_enforce and consists of those operations the client MUST | is spo_must_enforce and consists of those operations the client MUST | |||
send (subject to the server confirming the list of operations in the | send (subject to the server confirming the list of operations in the | |||
skipping to change at page 61, line 49 ¶ | skipping to change at page 80, line 5 ¶ | |||
ssp_num_gss_handles to zero; the client can create more handles | ssp_num_gss_handles to zero; the client can create more handles | |||
with another EXCHANGE_ID call. | with another EXCHANGE_ID call. | |||
Because each SSV RPCSEC_GSS handle shares a common SSV GSS | Because each SSV RPCSEC_GSS handle shares a common SSV GSS | |||
context, there are security considerations specific to this | context, there are security considerations specific to this | |||
situation discussed in Section 2.10.10 of [RFC5661]. | situation discussed in Section 2.10.10 of [RFC5661]. | |||
The seq_window (see Section 5.2.3.1 of [RFC2203]) of each | The seq_window (see Section 5.2.3.1 of [RFC2203]) of each | |||
RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window | RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window | |||
of the RPCSEC_GSS handle used for the credential of the RPC | of the RPCSEC_GSS handle used for the credential of the RPC | |||
request that the EXCHANGE_ID request was sent with. | request that the EXCHANGE_ID operation was sent as a part of. | |||
+-------------------+----------------------+------------------------+ | +-------------------+----------------------+------------------------+ | |||
| Encryption | MUST NOT be combined | SHOULD NOT be combined | | | Encryption | MUST NOT be combined | SHOULD NOT be combined | | |||
| Algorithm | with | with | | | Algorithm | with | with | | |||
+-------------------+----------------------+------------------------+ | +-------------------+----------------------+------------------------+ | |||
| id-aes128-CBC | | id-sha384, id-sha512 | | | id-aes128-CBC | | id-sha384, id-sha512 | | |||
| id-aes192-CBC | id-sha1 | id-sha512 | | | id-aes192-CBC | id-sha1 | id-sha512 | | |||
| id-aes256-CBC | id-sha1, id-sha224 | | | | id-aes256-CBC | id-sha1, id-sha224 | | | |||
+-------------------+----------------------+------------------------+ | +-------------------+----------------------+------------------------+ | |||
skipping to change at page 62, line 43 ¶ | skipping to change at page 80, line 45 ¶ | |||
peer's manifesting a particular allowed behavior based on an | peer's manifesting a particular allowed behavior based on an | |||
implementation identifier but are required to interoperate as | implementation identifier but are required to interoperate as | |||
specified elsewhere in the protocol specification. | specified elsewhere in the protocol specification. | |||
Because it is possible that some implementations might violate the | Because it is possible that some implementations might violate the | |||
protocol specification and interpret the identity information, | protocol specification and interpret the identity information, | |||
implementations MUST provide facilities to allow the NFSv4 client and | implementations MUST provide facilities to allow the NFSv4 client and | |||
server be configured to set the contents of the nfs_impl_id | server be configured to set the contents of the nfs_impl_id | |||
structures sent to any specified value. | structures sent to any specified value. | |||
13.4. IMPLEMENTATION | 14.4. IMPLEMENTATION | |||
A server's client record is a 5-tuple: | A server's client record is a 5-tuple: | |||
1. co_ownerid | 1. co_ownerid | |||
The client identifier string, from the eia_clientowner | The client identifier string, from the eia_clientowner | |||
structure of the EXCHANGE_ID4args structure. | structure of the EXCHANGE_ID4args structure. | |||
2. co_verifier: | 2. co_verifier: | |||
A client-specific value used to indicate incarnations (where a | A client-specific value used to indicate incarnations (where a | |||
client restart represents a new incarnation), from the | client restart represents a new incarnation), from the | |||
eia_clientowner structure of the EXCHANGE_ID4args structure. | eia_clientowner structure of the EXCHANGE_ID4args structure. | |||
3. principal: | 3. principal: | |||
skipping to change at page 67, line 38 ¶ | skipping to change at page 85, line 42 ¶ | |||
+ If the server subsequently receives a successful | + If the server subsequently receives a successful | |||
CREATE_SESSION that confirms clientid_ret, then the server | CREATE_SESSION that confirms clientid_ret, then the server | |||
atomically destroys the confirmed record and makes the | atomically destroys the confirmed record and makes the | |||
unconfirmed record confirmed as described in section | unconfirmed record confirmed as described in section | |||
16.36.3 of [RFC5661]. | 16.36.3 of [RFC5661]. | |||
+ If the server instead subsequently receives an EXCHANGE_ID | + If the server instead subsequently receives an EXCHANGE_ID | |||
with the client owner equal to ownerid_arg, one strategy is | with the client owner equal to ownerid_arg, one strategy is | |||
to simply delete the unconfirmed record, and process the | to simply delete the unconfirmed record, and process the | |||
EXCHANGE_ID as described in the entirety of Section 13.4. | EXCHANGE_ID as described in the entirety of Section 14.4. | |||
6. Update | 6. Update | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | |||
has the following confirmed record, then this request is an | has the following confirmed record, then this request is an | |||
attempt at an update. | attempt at an update. | |||
{ ownerid_arg, verifier_arg, principal_arg, clientid_ret, | { ownerid_arg, verifier_arg, principal_arg, clientid_ret, | |||
confirmed } | confirmed } | |||
Since the record has been confirmed, the client must have | Since the record has been confirmed, the client must have | |||
received the server's reply from the initial EXCHANGE_ID | received the server's reply from the initial EXCHANGE_ID | |||
request. The server allows the update, and the client record | request. The server allows the update, and the client record | |||
is left intact. | is left intact. | |||
7. Update but No Confirmed Record | 7. Update but No Confirmed Record | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | |||
has no confirmed record corresponding ownerid_arg, then the | has no confirmed record corresponding ownerid_arg, then the | |||
server returns NFS4ERR_NOENT and leaves any unconfirmed record | server returns NFS4ERR_NOENT and leaves any unconfirmed record | |||
skipping to change at page 68, line 40 ¶ | skipping to change at page 86, line 44 ¶ | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | |||
has the following confirmed record, then this request is an | has the following confirmed record, then this request is an | |||
illegal attempt at an update by an unauthorized principal. | illegal attempt at an update by an unauthorized principal. | |||
{ ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, | { ownerid_arg, verifier_arg, old_principal_arg, clientid_ret, | |||
confirmed } | confirmed } | |||
The server returns NFS4ERR_PERM and leaves the client record | The server returns NFS4ERR_PERM and leaves the client record | |||
intact. | intact. | |||
14. Security Considerations | 15. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished (as | |||
updated) | ||||
15.1. ARGUMENT | ||||
<CODE BEGINS> | ||||
struct RECLAIM_COMPLETE4args { | ||||
/* | ||||
* If rca_one_fs TRUE, | ||||
* | ||||
* CURRENT_FH: object in | ||||
* file system reclaim is | ||||
* complete for. | ||||
*/ | ||||
bool rca_one_fs; | ||||
}; | ||||
<CODE ENDS> | ||||
15.2. RESULTS | ||||
<CODE BEGINS> | ||||
struct RECLAIM_COMPLETE4res { | ||||
nfsstat4 rcr_status; | ||||
}; | ||||
<CODE ENDS> | ||||
15.3. DESCRIPTION | ||||
A RECLAIM_COMPLETE operation is used to indicate that the client has | ||||
reclaimed all of the locking state that it will recover using | ||||
reclaim, when it is recovering state due to either a server restart | ||||
or the migration of a file system to another server. There are two | ||||
types of RECLAIM_COMPLETE operations: | ||||
o When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being done. | ||||
This indicates that recovery of all locks that the client held on | ||||
the previous server instance have been completed. The current | ||||
filehandle need not be set in this case. | ||||
o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE | ||||
is being done. This indicates that recovery of locks for a single | ||||
fs (the one designated by the current filehandle) due to the | ||||
migration of the file system has been completed. Presence of a | ||||
current filehandle is required when rca_one_fs is set to TRUE. | ||||
When the current filehandle designates a filehandle in a file | ||||
system not in the process of migration, the operation returns | ||||
NFS4_OK and is otherwise ignored. | ||||
Once a RECLAIM_COMPLETE is done, there can be no further reclaim | ||||
operations for locks whose scope is defined as having completed | ||||
recovery. Once the client sends RECLAIM_COMPLETE, the server will | ||||
not allow the client to do subsequent reclaims of locking state for | ||||
that scope and, if these are attempted, will return NFS4ERR_NO_GRACE. | ||||
Whenever a client establishes a new client ID and before it does the | ||||
first non-reclaim operation that obtains a lock, it MUST send a | ||||
RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no | ||||
locks to reclaim. If non-reclaim locking operations are done before | ||||
the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. | ||||
Similarly, when the client accesses a migrated file system on a new | ||||
server, before it sends the first non-reclaim operation that obtains | ||||
a lock on this new server, it MUST send a RECLAIM_COMPLETE with | ||||
rca_one_fs set to TRUE and current filehandle within that file | ||||
system, even if there are no locks to reclaim. If non-reclaim | ||||
locking operations are done on that file system before the | ||||
RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. | ||||
It should be noted that there are situations in which a client needs | ||||
to issue both forms of RECLAIM_COMPLETE. An example is an instance | ||||
of file system migration in which the file system is migrated to a | ||||
server for which the client has no clientid. As a result, the client | ||||
needs to obtain a clientid from the server (incurring the | ||||
responsibility to do RECLAIM_COMPLETE with rca_one_fs set to FALSE) | ||||
as well as RECLAIM_COMPLETE with rca_one_fs set to TRUE to complete | ||||
the per-fs grace period associated with the file system migration. | ||||
Any locks not reclaimed at the point at which RECLAIM_COMPLETE is | ||||
done become non-reclaimable. The client MUST NOT attempt to reclaim | ||||
them, either during the current server instance or in any subsequent | ||||
server instance, or on another server to which responsibility for | ||||
that file system is transferred. If the client were to do so, it | ||||
would be violating the protocol by representing itself as owning | ||||
locks that it does not own, and so has no right to reclaim. See | ||||
Section 8.4.3 of [RFC5661] for a discussion of edge conditions | ||||
related to lock reclaim. | ||||
By sending a RECLAIM_COMPLETE, the client indicates readiness to | ||||
proceed to do normal non-reclaim locking operations. The client | ||||
should be aware that such operations may temporarily result in | ||||
NFS4ERR_GRACE errors until the server is ready to terminate its grace | ||||
period. | ||||
15.4. IMPLEMENTATION | ||||
Servers will typically use the information as to when reclaim | ||||
activity is complete to reduce the length of the grace period. When | ||||
the server maintains in persistent storage a list of clients that | ||||
might have had locks, it is able to use the fact that all such | ||||
clients have done a RECLAIM_COMPLETE to terminate the grace period | ||||
and begin normal operations (i.e., grant requests for new locks) | ||||
sooner than it might otherwise. | ||||
Latency can be minimized by doing a RECLAIM_COMPLETE as part of the | ||||
COMPOUND request in which the last lock-reclaiming operation is done. | ||||
When there are no reclaims to be done, RECLAIM_COMPLETE should be | ||||
done immediately in order to allow the grace period to end as soon as | ||||
possible. | ||||
RECLAIM_COMPLETE should only be done once for each server instance or | ||||
occasion of the transition of a file system. If it is done a second | ||||
time, the error NFS4ERR_COMPLETE_ALREADY will result. Note that | ||||
because of the session feature's retry protection, retries of | ||||
COMPOUND requests containing RECLAIM_COMPLETE operation will not | ||||
result in this error. | ||||
When a RECLAIM_COMPLETE is sent, the client effectively acknowledges | ||||
any locks not yet reclaimed as lost. This allows the server to re- | ||||
enable the client to recover locks if the occurrence of edge | ||||
conditions, as described in Section 8.4.3 of [RFC5661], had caused | ||||
the server to disable the client's ability to recover locks. | ||||
Because previous descriptions of RECLAIM_COMPLETE were not | ||||
sufficiently explicit about the circumstances in which use of | ||||
RECLAIM_COMPLETE with rca_one_fs set to TRUE was appropriate, there | ||||
have been cases which it has been misused by clients, and cases in | ||||
which servers have, in various ways, not responded to such misuse as | ||||
described above. While clients SHOULD NOT misuse this feature and | ||||
servers SHOULD respond to such misuse as described above, | ||||
implementers need to be aware of the following considerations as they | ||||
make necessary tradeoffs between interoperability with existing | ||||
implementations and proper support for facilities to allow lock | ||||
recovery in the event of file system migration. | ||||
o When servers have no support for becoming the destination server | ||||
of a file system subject to migration, there is no possibility of | ||||
a per-fs RECLAIM_COMPLETE being done legitimately and occurrences | ||||
of it SHOULD be ignored. However, the negative consequences of | ||||
accepting mistaken use are quite limited as long as the does not | ||||
issue it before all necessary reclaims are done. | ||||
o When a server might become the destination for a file system being | ||||
migrated, inappropriate use per-fs RECLAIM_COMPLETE is more | ||||
concerning. In the case in which the file system designated is | ||||
not within a per-fs grace period, it SHOULD be ignored, with the | ||||
negative consequences of accepting it being limited, as in the | ||||
case in which migration is not supported. However, if it should | ||||
encounter a file system undergoing migration, it cannot be | ||||
accepted as if it were a global RECLAIM_COMPLETE without | ||||
invalidating its intended use. | ||||
16. Security Considerations | ||||
The Security Considerations section of [RFC5661] needs the additions | The Security Considerations section of [RFC5661] needs the additions | |||
below to properly address some aspects of trunking discovery, | below to properly address some aspects of trunking discovery, | |||
referral, migration and replication. | referral, migration and replication. | |||
The possibility that requests to determine the set of network | The possibility that requests to determine the set of network | |||
addresses corresponding to a given server might be interfered with | addresses corresponding to a given server might be interfered with | |||
or have their responses corrupted needs to be taken into account. | or have their responses corrupted needs to be taken into account. | |||
In light of this, the following considerations should be taken | In light of this, the following considerations should be taken | |||
note of: | note of: | |||
skipping to change at page 70, line 45 ¶ | skipping to change at page 92, line 19 ¶ | |||
o The use of requests issued without RPCSEC_GSS (i.e. using | o The use of requests issued without RPCSEC_GSS (i.e. using | |||
AUTH_SYS), while undesirable, may not be avoidable in all | AUTH_SYS), while undesirable, may not be avoidable in all | |||
cases. Where the use of the returned information cannot be | cases. Where the use of the returned information cannot be | |||
avoided, it should be subject to filtering to eliminate the | avoided, it should be subject to filtering to eliminate the | |||
possibility that the client would treat an invalid address as | possibility that the client would treat an invalid address as | |||
if it were a NFSv4 server. The specifics will vary depending | if it were a NFSv4 server. The specifics will vary depending | |||
on the degree of network isolation and whether the request is | on the degree of network isolation and whether the request is | |||
to the referring or destination servers. | to the referring or destination servers. | |||
15. IANA Considerations | 17. IANA Considerations | |||
This document does not require actions by IANA. | This document does not require actions by IANA. | |||
16. References | 18. References | |||
16.1. Normative References | 18.1. Normative References | |||
[CSOR_AES] | [CSOR_AES] | |||
National Institute of Standards and Technology, | National Institute of Standards and Technology, | |||
"Cryptographic Algorithm Object Registration", URL | "Cryptographic Algorithm Object Registration", URL | |||
http://csrc.nist.gov/groups/ST/crypto_apps_infra/csor/ | http://csrc.nist.gov/groups/ST/crypto_apps_infra/csor/ | |||
algorithms.html, November 2007. | algorithms.html, November 2007. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
skipping to change at page 72, line 19 ¶ | skipping to change at page 93, line 36 ¶ | |||
[RFC7931] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, | [RFC7931] Noveck, D., Ed., Shivam, P., Lever, C., and B. Baker, | |||
"NFSv4.0 Migration: Specification Update", RFC 7931, | "NFSv4.0 Migration: Specification Update", RFC 7931, | |||
DOI 10.17487/RFC7931, July 2016, | DOI 10.17487/RFC7931, July 2016, | |||
<https://www.rfc-editor.org/info/rfc7931>. | <https://www.rfc-editor.org/info/rfc7931>. | |||
[RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct | [RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct | |||
Memory Access Transport for Remote Procedure Call Version | Memory Access Transport for Remote Procedure Call Version | |||
1", RFC 8166, DOI 10.17487/RFC8166, June 2017, | 1", RFC 8166, DOI 10.17487/RFC8166, June 2017, | |||
<https://www.rfc-editor.org/info/rfc8166>. | <https://www.rfc-editor.org/info/rfc8166>. | |||
16.2. Informative References | [RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor | |||
Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017, | ||||
<https://www.rfc-editor.org/info/rfc8178>. | ||||
18.2. Informative References | ||||
[I-D.cel-nfsv4-mv0-trunking-update] | [I-D.cel-nfsv4-mv0-trunking-update] | |||
Lever, C. and D. Noveck, "NFS version 4.0 Trunking | Lever, C. and D. Noveck, "NFS version 4.0 Trunking | |||
Update", draft-cel-nfsv4-mv0-trunking-update-00 (work in | Update", draft-cel-nfsv4-mv0-trunking-update-00 (work in | |||
progress), November 2017. | progress), November 2017. | |||
[RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- | [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- | |||
Hashing for Message Authentication", RFC 2104, | Hashing for Message Authentication", RFC 2104, | |||
DOI 10.17487/RFC2104, February 1997, | DOI 10.17487/RFC2104, February 1997, | |||
<https://www.rfc-editor.org/info/rfc2104>. | <https://www.rfc-editor.org/info/rfc2104>. | |||
skipping to change at page 73, line 20 ¶ | skipping to change at page 94, line 42 ¶ | |||
o Section 4.5.7 is an additional section. | o Section 4.5.7 is an additional section. | |||
o Section 5 is explanatory. | o Section 5 is explanatory. | |||
o Sections 6 and 7 are additional sections. | o Sections 6 and 7 are additional sections. | |||
o Sections 8 through 8.9, a total of ten sections, are all | o Sections 8 through 8.9, a total of ten sections, are all | |||
replacement sections. | replacement sections. | |||
o Sections 9 through 11.2, a total of eleven sections, are all | o Sections 9 through 11.3, a total of twelve sections, are all | |||
additional sections. | additional sections. | |||
o Section 12 is explanatory. | o Section 12.1 is explanatory. | |||
o Sections 12.1 and 12.2 are replacement sections. | o Sections 12.2 throuhy 12.2.3, a total of four sections, are all | |||
replacemebt sections. | ||||
o Sections 12.3 and 12.4 are editing sections. | o Section 13 is explanatory. | |||
o Section 12.5 is explanatory. | o Sections 13.1 and 13.2 are replacement sections. | |||
o Section 13 is a replacement section, which consists of a total of | o Sections 13.3 and 13.4 are editing sections. | |||
o Sections 13.5 and 13.6 is explanatory. | ||||
o Section 13.7 is a replcement section, which consists of a total of | ||||
six sections. | ||||
o Section 14 is a replacement section, which consists of a total of | ||||
five sections. | five sections. | |||
o Section 14 is an editing section. | o Section 15 is a replacement section, which consists of a total of | |||
five sections. | ||||
o Section 15 through Acknowledgments, a total of six sections, are | o Section 16 is an editing section. | |||
o Section 17 through Acknowledgments, a total of six sections, are | ||||
all explanatory. | all explanatory. | |||
To summarize: | To summarize: | |||
o There are fifteen explanatory sections. | o There are seventeen explanatory sections. | |||
o There are twenty-two replacement sections. | o There are thirty-seven replacement sections. | |||
o There are eightteen additional sections. | o There are eightteen additional sections. | |||
o There are three editing sections. | o There are three editing sections. | |||
Appendix B. Updates to RFC5661 | Appendix B. Updates to RFC5661 | |||
In this appendix, we proceed through [RFC5661] identifying sections | In this appendix, we proceed through [RFC5661] identifying sections | |||
as unchanged, modified, deleted, or replaced and indicating where | as unchanged, modified, deleted, or replaced and indicating where | |||
additional sections from the current document would appear in an | additional sections from the current document would appear in an | |||
eventual consolidated description of NFSv4.1. In this presentation, | eventual consolidated description of NFSv4.1. In this presentation, | |||
when section X is referred to, it denotes that section plus all | when section X is referred to, it denotes that section plus all | |||
included subsections. When it is necessary to refer to the part of a | included subsections. When it is necessary to refer to the part of a | |||
section outside any included subsections, the exclusion is noted | section outside any included subsections, the exclusion is noted | |||
explicitly. | explicitly. | |||
o Section 1 is unmodified except that Section 1.7.3.3 is to be | o Section 1 is unmodified except that Section 1.7.3.3 is to be | |||
replaced by Section 12.1 from the current document. | replaced by Section 13.1 from the current document. | |||
o Section 2 is unmodified except for the specific items listed | o Section 2 is unmodified except for the specific items listed | |||
below: | below: | |||
o Section 2.10.4 is replaced by Section 12.2 from the current | o Section 2.10.4 is replaced by Section 13.2 from the current | |||
document. | document. | |||
o Section 2.10.5 is modified as discussed in Section 12.4 of the | o Section 2.10.5 is modified as discussed in Section 13.4 of the | |||
current document. | current document. | |||
o Sections 3 through 10 are unchanged. | o Sections 3 through 10 are unchanged. | |||
o Section 11 is extensively modified as discussed below. | o Section 11 is extensively modified as discussed below. | |||
o Section 11, exclusive of subsections, is replaced by Sections | o Section 11, exclusive of subsections, is replaced by Sections | |||
4.1 and 4.2 from the current document. | 4.1 and 4.2 from the current document. | |||
o Section 11.1 is replaced by Section 4.3 from the current | o Section 11.1 is replaced by Section 4.3 from the current | |||
skipping to change at page 75, line 39 ¶ | skipping to change at page 97, line 18 ¶ | |||
o Section 11.7.7, exclusive of subsections, is replaced by | o Section 11.7.7, exclusive of subsections, is replaced by | |||
Section 8.9. Sections 11.7.7.1 and 11.7.72 are unchanged. | Section 8.9. Sections 11.7.7.1 and 11.7.72 are unchanged. | |||
o Section 11.7.8 is replaced by Section 8.6 | o Section 11.7.8 is replaced by Section 8.6 | |||
o Section 11.7.9 is replaced by Section 8.7 | o Section 11.7.9 is replaced by Section 8.7 | |||
o Section 11.7.10 is replaced by Section 8.8 | o Section 11.7.10 is replaced by Section 8.8 | |||
o Sections 11.8, 11.8.1, 11.8.2, 11.9, 11.10, 11.10.1, 11.10.2, | o Sections 11.8, 11.8.1, 11.8.2, and 11.9, are unchanged. | |||
11.10.3, and 11.11 are unchanged. | ||||
o Sections 11.10, 11.10.1, 11.10.2, and 11.10.3 are replaced by | ||||
Sections 12.2 through 12.2.3. | ||||
o Section 11.11 is unchanged. | ||||
o New sections corresponding to Sections 9, 10, and 11 from the | o New sections corresponding to Sections 9, 10, and 11 from the | |||
current document appear next as additional sub-sections of | current document appear next as additional sub-sections of | |||
Section 11. Each of these has subsections, so there is a total | Section 11. Each of these has subsections, so there is a total | |||
of seventeen sections added. | of seventeen sections added. | |||
o Sections 12 through 14 are unchanged. | o Sections 12 through 14 are unchanged. | |||
o Section 15 is unmodified except that the description of | o Section 15 is unmodified except that | |||
NFS4ERR_MOVED in Section 15.1 is revised as described in | ||||
Section 12.3 of the current document. | * The description of NFS4ERR_MOVED in Section 15.1 is revised as | |||
described in Section 13.3 of the current document. | ||||
* The description of the reclaim-related errors in section 15.1.9 | ||||
is replaced by the revised descriptions in Section 13.7 of the | ||||
current document. | ||||
o Sections 16 and 17 are unchanged. | o Sections 16 and 17 are unchanged. | |||
o Section 18 is unmodified except that section 18.35 is replaced by | o Section 18 is unmodified except the | |||
Section 13 in the current document. | ||||
* Section 18.35 is replaced by Section 14 in the current | ||||
document. | ||||
* Section 18.51 is replaced by Section 15 in the current | ||||
document. | ||||
o Sections 19 through 23 are unchanged. | o Sections 19 through 23 are unchanged. | |||
In terms of top-level sections, exclusive of appendices: | In terms of top-level sections, exclusive of appendices: | |||
o There is one heavily modified top-level section (Section 11) | o There is one heavily modified top-level section (Section 11) | |||
o There are four other modified top-level sections (Sections 1, 2, | o There are four other modified top-level sections (Sections 1, 2, | |||
15, and 18). | 15, and 18). | |||
skipping to change at page 76, line 40 ¶ | skipping to change at page 98, line 33 ¶ | |||
o Sections outside Section 11. | o Sections outside Section 11. | |||
In this table, the counts for top-level sections and TOC entries are | In this table, the counts for top-level sections and TOC entries are | |||
for sections including subsections while other counts are for | for sections including subsections while other counts are for | |||
sections exclusive of included subsections. | sections exclusive of included subsections. | |||
+------------+------+------+--------+------------+--------+ | +------------+------+------+--------+------------+--------+ | |||
| Status | Top | TOC | in 11 | not in 11 | Total | | | Status | Top | TOC | in 11 | not in 11 | Total | | |||
+------------+------+------+--------+------------+--------+ | +------------+------+------+--------+------------+--------+ | |||
| Replaced | 0 | 3 | 17 | 7 | 24 | | | Replaced | 0 | 6 | 21 | 15 | 36 | | |||
| Added | 0 | 6 | 23 | 0 | 23 | | | Added | 0 | 5 | 24 | 0 | 24 | | |||
| Deleted | 0 | 1 | 4 | 0 | 4 | | | Deleted | 0 | 1 | 4 | 0 | 4 | | |||
| Modified | 5 | 4 | 0 | 2 | 2 | | | Modified | 5 | 3 | 0 | 2 | 2 | | |||
| Unchanged | 18 | 212 | 16 | 918 | 934 | | | Unchanged | 18 | 210 | 12 | 910 | 922 | | |||
| in RFC5661 | 23 | 220 | 37 | 927 | 964 | | | in RFC5661 | 23 | 220 | 37 | 927 | 964 | | |||
+------------+------+------+--------+------------+--------+ | +------------+------+------+--------+------------+--------+ | |||
Acknowledgments | Acknowledgments | |||
The authors wish to acknowledge the important role of Andy Adamson of | The authors wish to acknowledge the important role of Andy Adamson of | |||
Netapp in clarifying the need for trunking discovery functionality, | Netapp in clarifying the need for trunking discovery functionality, | |||
and exploring the role of the location attributes in providing the | and exploring the role of the location attributes in providing the | |||
necessary support. | necessary support. | |||
The authors also wish to acknowledge the work of Xuan Qi of Oracle | The authors also wish to acknowledge the work of Xuan Qi of Oracle | |||
with NFSv4.1 client and server prototypes of transparent state | with NFSv4.1 client and server prototypes of transparent state | |||
migration functionality. | migration functionality. | |||
The authors wish to thank Trond Myklebust of Primary Data for his | The authors wish to thank others that brought attention to important | |||
comments related to trunking, helping to clarify the role of DNS in | issues. The comments of Trond Myklebust of Primary Data related to | |||
trunking discovery. | trunking helped to clarify the role of DNS in trunking discovery. | |||
Rick Macklem's comments brought attention to problems in the handling | ||||
of the per-fs version of RECLAIM_COMPLETE. | ||||
The authors wish to thank Olga Kornievskaia of Netapp for her helpful | The authors wish to thank Olga Kornievskaia of Netapp for her helpful | |||
review comments. | review comments. | |||
Authors' Addresses | Authors' Addresses | |||
David Noveck (editor) | David Noveck (editor) | |||
NetApp | NetApp | |||
1601 Trapelo Road | 1601 Trapelo Road | |||
Waltham, MA 02451 | Waltham, MA 02451 | |||
End of changes. 136 change blocks. | ||||
280 lines changed or deleted | 1327 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |