draft-ietf-nfsv4-minorversion1-09.txt   draft-ietf-nfsv4-minorversion1-10.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: September 3, 2007 Editors Expires: September 5, 2007 Editors
March 2, 2007 March 4, 2007
NFSv4 Minor Version 1 NFSv4 Minor Version 1
draft-ietf-nfsv4-minorversion1-09.txt draft-ietf-nfsv4-minorversion1-10.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 3, 2007. This Internet-Draft will expire on September 5, 2007.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2007). Copyright (C) The IETF Trust (2007).
Abstract Abstract
This Internet-Draft describes NFSv4 minor version one, including This Internet-Draft describes NFSv4 minor version one, including
features retained from the base protocol and protocol extensions made features retained from the base protocol and protocol extensions made
subsequently. The current draft includes description of the major subsequently. The current draft includes description of the major
skipping to change at page 240, line 37 skipping to change at page 240, line 37
When a pNFS client encounters a new FSID, it issues a GETATTR to the When a pNFS client encounters a new FSID, it issues a GETATTR to the
NFSv4.1 server for the fs_layout_type (Section 5.13.1) attribute. If NFSv4.1 server for the fs_layout_type (Section 5.13.1) attribute. If
the attribute returns at least one layout type, and the layout the attribute returns at least one layout type, and the layout
type(s) returned is(are) among the set supported by the client, the type(s) returned is(are) among the set supported by the client, the
client knows that pNFS is a possibility for the filesystem. If, from client knows that pNFS is a possibility for the filesystem. If, from
the server that returned the new FSID, the client does not have a the server that returned the new FSID, the client does not have a
client ID that came from an EXCHANGE_ID result that returned client ID that came from an EXCHANGE_ID result that returned
EXCHGID4_FLAG_USE_PNFS_MDS, it must send an EXCHANGE_ID to the server EXCHGID4_FLAG_USE_PNFS_MDS, it must send an EXCHANGE_ID to the server
with the EXCHGID4_FLAG_USE_PNFS_MDS bit set. If the server's with the EXCHGID4_FLAG_USE_PNFS_MDS bit set. If the server's
response does not have EXCHGID4_FLAG_USE_PNFS_MDS, then contrary to response does not have EXCHGID4_FLAG_USE_PNFS_MDS, then contrary to
what the fs_layout_type attribute said, the client does not support what the fs_layout_type attribute said, the server does not support
pNFS, and the client will not be able use pNFS to that server. pNFS, and the client will not be able use pNFS to that server.
Once the client has a client ID that supports pNFS, it creates a Once the client has a client ID that supports pNFS, it creates a
persistent session over the client ID, requesting persistent. persistent session over the client ID, requesting persistent.
If the client wants to create a file on the file system identified by If the client wants to create a file on the file system identified by
the FSID that supports pNFS, it issues an OPEN with a create type of the FSID that supports pNFS, it issues an OPEN with a create type of
GUARDED4 (if it wants an exclusive create), or UNCHECKED4 (if it does GUARDED4 (if it wants an exclusive create), or UNCHECKED4 (if it does
not want an exclusive create). Among the various attributes it sets not want an exclusive create). Among the various attributes it sets
in createattrs, it includes layout_hint and fills it with information in createattrs, it includes layout_hint and fills it with information
skipping to change at page 261, line 15 skipping to change at page 261, line 15
option the client has is getting a new layout or just rewrite the option the client has is getting a new layout or just rewrite the
data through the metadata server. If the flag nfl_commit_through_mds data through the metadata server. If the flag nfl_commit_through_mds
is FALSE, the client should not send COMMIT to the metadata server. is FALSE, the client should not send COMMIT to the metadata server.
Although it is valid to send COMMIT to the metadata server it should Although it is valid to send COMMIT to the metadata server it should
be used only to commit data that was written through the metadata be used only to commit data that was written through the metadata
server. See Section 12.7.6 for recovery options. server. See Section 12.7.6 for recovery options.
13.9. Global Stateid Requirements 13.9. Global Stateid Requirements
Note, there are no stateids embedded within the layout returned by Note, there are no stateids embedded within the layout returned by
the metadata server to the pNFS client. The client MUST use the the metadata server to the pNFS client. The client uses a stateid
stateid representing open or lock state (i.e. NOT a delegation returned previously by the metadata server (including results from
stateid) returned by an earlier operation request to the metadata OPEN -- a delegation stateid is acceptable as well as a non-
server (e.g., OPEN, LOCK), or a special stateid to perform I/O on the delegation stateid -- lock operations, WANT_DELEGATION, and also from
data servers, as in regular NFSv4.1. Special stateid usage for I/O the CB_PUSH_DELEG callback operation) or a special stateid to perform
is subject to the NFSv4.1 protocol specification. The stateid used I/O on the data servers, as in regular NFSv4.1. Special stateid
for I/O MUST have the same effect and be subject to the same usage for I/O is subject to the NFSv4.1 protocol specification. The
validation on data server as it would if the I/O was being performed stateid used for I/O MUST have the same effect and be subject to the
on the metadata server itself in the absence of pNFS. This has the same validation on data server as it would if the I/O was being
implication that stateids are globally valid on both the metadata and performed on the metadata server itself in the absence of pNFS. This
data servers. This requires the metadata server to propagate changes has the implication that stateids are globally valid on both the
in lock and open state to the data servers, so that the data servers metadata and data servers. This requires the metadata server to
can validate I/O accesses. This is discussed further in propagate changes in lock and open state to the data servers, so that
Section 13.11. Depending on when stateids are propagated, the the data servers can validate I/O accesses. This is discussed
existence of a valid stateid on the data server may act as proof of a further in Section 13.11. Depending on when stateids are propagated,
valid layout. the existence of a valid stateid on the data server may act as proof
of a valid layout.
13.10. The Layout Iomode 13.10. The Layout Iomode
The layout iomode need not be used by the metadata server when The layout iomode need not be used by the metadata server when
servicing NFSv4.1 file-based layouts, although in some circumstances servicing NFSv4.1 file-based layouts, although in some circumstances
it may be useful to use. For example, if the server implementation it may be useful to use. For example, if the server implementation
supports reading from read-only replicas or mirrors, it would be supports reading from read-only replicas or mirrors, it would be
useful for the server to return a layout enabling the client to do useful for the server to return a layout enabling the client to do
so. As such, the client SHOULD set the iomode based on its intent to so. As such, the client SHOULD set the iomode based on its intent to
read or write the data. The client may default to an iomode of read or write the data. The client may default to an iomode of
skipping to change at page 442, line 39 skipping to change at page 442, line 39
union layoutrecall4 switch(layoutrecall_type4 recalltype) { union layoutrecall4 switch(layoutrecall_type4 recalltype) {
case LAYOUTRECALL4_FILE: case LAYOUTRECALL4_FILE:
layoutrecall_file4 lor_layout; layoutrecall_file4 lor_layout;
case LAYOUTRECALL4_FSID: case LAYOUTRECALL4_FSID:
fsid4 lor_fsid; fsid4 lor_fsid;
case LAYOUTRECALL4_ALL: case LAYOUTRECALL4_ALL:
void; void;
}; };
struct CB_LAYOUTRECALL4args { struct CB_LAYOUTRECALL4args {
layouttype4 lora_type; layouttype4 clora_type;
layoutiomode4 lora_iomode; layoutiomode4 clora_iomode;
bool lora_changed; bool clora_changed;
layoutrecall4 lora_recall; layoutrecall4 clora_recall;
}; };
19.3.3. RESULT 19.3.3. RESULT
struct CB_LAYOUTRECALL4res { struct CB_LAYOUTRECALL4res {
nfsstat4 lorr_status; nfsstat4 clorr_status;
}; };
19.3.4. DESCRIPTION 19.3.4. DESCRIPTION
The CB_LAYOUTRECALL operation is used to begin the process of The CB_LAYOUTRECALL operation is used to begin the process of
recalling layout segments, a layout, all layouts pertaining to a recalling layout segments, a layout, all layouts pertaining to a
particular file system (FSID), or layouts in all file systems (ALL). particular file system (FSID), or layouts in all file systems (ALL).
If LAYOUTRECALL4_FILE is specified, the lrf_offset and lrf_length If LAYOUTRECALL4_FILE is specified, the lrf_offset and lrf_length
fields specify the layout segments. If a lrf_length of all ones is fields specify the layout segments. If a lrf_length of all ones is
specified then all layout segments identified by the current file specified then all layout segments identified by the current file
handle, lora_type, lora_iomode, and corresponding to the octet range handle, clora_type, clora_iomode, and corresponding to the octet
from lrf_offset to the end-of-file MUST be returned (via range from lrf_offset to the end-of-file MUST be returned (via
LAYOUTRETURN, see Section 17.44). The lora_iomode specifies the set LAYOUTRETURN, see Section 17.44). The clora_iomode specifies the set
of layouts to be returned. An lora_iomode of LAYOUTIOMODE4_ANY of layouts to be returned. An clora_iomode of LAYOUTIOMODE4_ANY
specifies that all matching layout segments regardless of iomode, specifies that all matching layout segments regardless of iomode,
must be returned; otherwise, only layout segments that exactly match must be returned; otherwise, only layout segments that exactly match
the iomode must be returned. If lora_iomode is LAYOUTIOMODE4_ANY, the iomode must be returned. If clora_iomode is LAYOUTIOMODE4_ANY,
lo_offset is zero, and lo_length is all ones, then the entire layout lo_offset is zero, and lo_length is all ones, then the entire layout
is to be returned. is to be returned.
If the "lora_changed" field is TRUE, then the client SHOULD not write If the clora_changed field is TRUE, then the client SHOULD not write
and commit its modified data to the storage devices specified by the and commit its modified data to the storage devices specified by the
layout being recalled. Instead, it is preferable for the client to layout being recalled. Instead, it is preferable for the client to
write and commit the modified data through the metadata server. write and commit the modified data through the metadata server.
Alternatively, the client may attempt to obtain a new layout. Note: Alternatively, the client may attempt to obtain a new layout. Note:
in order to obtain a new layout the client must first return the old in order to obtain a new layout the client must first return the old
layout. Since obtaining a new layout is not guaranteed to succeed, layout. Since obtaining a new layout is not guaranteed to succeed,
the client must be ready to write and commit its modified data the client must be ready to write and commit its modified data
through the metadata server. through the metadata server.
If the client does not hold any layout segment either matching or If the client does not hold any layout segment either matching or
skipping to change at page 444, line 10 skipping to change at page 444, line 10
19.3.5. IMPLEMENTATION 19.3.5. IMPLEMENTATION
The client should reply to the callback immediately. Replying does The client should reply to the callback immediately. Replying does
not complete the recall except when an error is returned; otherwise not complete the recall except when an error is returned; otherwise
the recall is not complete until the layout(s) are returned using a the recall is not complete until the layout(s) are returned using a
LAYOUTRETURN operation. LAYOUTRETURN operation.
The client should complete any in-flight I/O operations using the The client should complete any in-flight I/O operations using the
recalled layout(s) before returning it/them via LAYOUTRETURN. If the recalled layout(s) before returning it/them via LAYOUTRETURN. If the
client has buffered modified data there are a number of options for client has buffered modified data there are a number of options for
writing and committing that data. If "lora_changed" is false, the writing and committing that data. If clora_changed is false, the
client may choose to write modified data directly to storage before client may choose to write modified data directly to storage before
calling LAYOUTRETURN. However, if "lora_changed" is true, the client calling LAYOUTRETURN. However, if clora_changed is true, the client
may either choose to write it later using normal NFSv4 WRITE may either choose to write it later using normal NFSv4 WRITE
operations to the metadata server or it may attempt to obtain a new operations to the metadata server or it may attempt to obtain a new
layout, after first returning the recalled layout, using the new layout, after first returning the recalled layout, using the new
layout to write the modified data. Regardless of whether the client layout to write the modified data. Regardless of whether the client
is holding a layout, it may always write data through the metadata is holding a layout, it may always write data through the metadata
server. server.
If modified data is written while the layout is held, the client must If modified data is written while the layout is held, the client must
still issue LAYOUTCOMMIT operations at the appropriate time, still issue LAYOUTCOMMIT operations at the appropriate time,
especially before issuing the LAYOUTRETURN. If a large amount of especially before issuing the LAYOUTRETURN. If a large amount of
skipping to change at page 446, line 11 skipping to change at page 446, line 11
notify_remove4 n_remove_notify; notify_remove4 n_remove_notify;
case NOTIFY4_ADD_ENTRY: case NOTIFY4_ADD_ENTRY:
notify_add4 n_add_notify; notify_add4 n_add_notify;
case NOTIFY4_RENAME_ENTRY: case NOTIFY4_RENAME_ENTRY:
notify_rename4 n_rename_notify; notify_rename4 n_rename_notify;
case NOTIFY4_CHANGE_COOKIE_VERIFIER: case NOTIFY4_CHANGE_COOKIE_VERIFIER:
notify_verifier4 n_verf_notify; notify_verifier4 n_verf_notify;
}; };
struct CB_NOTIFY4args { struct CB_NOTIFY4args {
stateid4 na_stateid; stateid4 cna_stateid;
notify4 na_changes<>; nfs_fh4 cna_fh;
notify4 cna_changes<>;
}; };
19.4.3. RESULT 19.4.3. RESULT
struct CB_NOTIFY4res { struct CB_NOTIFY4res {
nfsstat4 nr_status; nfsstat4 cnr_status;
}; };
19.4.4. DESCRIPTION 19.4.4. DESCRIPTION
The CB_NOTIFY operation is used by the server to send notifications The CB_NOTIFY operation is used by the server to send notifications
to clients about changes in a delegated directory. These to clients about changes in a delegated directory. These
notifications are sent over the callback path. The notification is notifications are sent over the callback path. The notification is
sent once the original request has been processed on the server. The sent once the original request has been processed on the server. The
server will send an array of notifications for all changes that might server will send an array of notifications for all changes that might
have occurred in the directory. The notify_type4 can only have one have occurred in the directory. The notify_type4 can only have one
skipping to change at page 448, line 8 skipping to change at page 448, line 8
19.5. Operation 7: CB_PUSH_DELEG 19.5. Operation 7: CB_PUSH_DELEG
19.5.1. SYNOPSIS 19.5.1. SYNOPSIS
fh, stateid -> { } fh, stateid -> { }
19.5.2. ARGUMENT 19.5.2. ARGUMENT
struct CB_PUSH_DELEG4args { struct CB_PUSH_DELEG4args {
nfs_fh4 pda_fh; stateid4 cpda_stateid;
stateid4 pda_stateid; nfs_fh4 cpda_fh;
open_delegation4 pda_delegation; open_delegation4 cpda_delegation;
}; };
19.5.3. RESULT 19.5.3. RESULT
struct CB_PUSH_DELEG4res { struct CB_PUSH_DELEG4res {
nfsstat4 cpdr_status; nfsstat4 cpdr_status;
}; };
19.5.4. DESCRIPTION 19.5.4. DESCRIPTION
CB_PUSH_DELEG is used by the server to both signal to the client that CB_PUSH_DELEG is used by the server to both signal to the client that
the delegation it wants is available and to simultaneously offer the the delegation it wants is available and to simultaneously offer the
delegation to the client. The client has the choice of accepting the delegation to the client. The client has the choice of accepting the
delegation by returning NFS4_OK to the server, delaying the decision delegation by returning NFS4_OK to the server, delaying the decision
to accept the offered delegation by returning NFS4ERR_DELAY or to accept the offered delegation by returning NFS4ERR_DELAY or
permanently rejecting the offer of the delegation via any other error permanently rejecting the offer of the delegation via any other error
status. status.
The server MUST send in pda_delegation a delegation corresponding to The server MUST send in cpda_delegation a delegation corresponding to
the type of what the client requested in the OPEN, WANT_DELEGATION, the type of what the client requested in the OPEN, WANT_DELEGATION,
or GET_DIR_DELEGATION request. or GET_DIR_DELEGATION request.
If the client does return NFS4ERR_DELAY and there is a conflicting If the client does return NFS4ERR_DELAY and there is a conflicting
delegation request, the server MAY process it at the expense of the delegation request, the server MAY process it at the expense of the
client that returned NFS4ERR_DELAY. The client's want will not be client that returned NFS4ERR_DELAY. The client's want will not be
cancelled, but MAY processed behind other delegation requests or cancelled, but MAY processed behind other delegation requests or
registered wants. registered wants.
19.5.5. IMPLEMENTATION 19.5.5. IMPLEMENTATION
skipping to change at page 449, line 19 skipping to change at page 449, line 19
const RCA4_TYPE_MASK_DIR_DLG = 2; const RCA4_TYPE_MASK_DIR_DLG = 2;
const RCA4_TYPE_MASK_FILE_LAYOUT = 3; const RCA4_TYPE_MASK_FILE_LAYOUT = 3;
const RCA4_TYPE_MASK_BLK_LAYOUT_MIN = 4; const RCA4_TYPE_MASK_BLK_LAYOUT_MIN = 4;
const RCA4_TYPE_MASK_BLK_LAYOUT_MAX = 7; const RCA4_TYPE_MASK_BLK_LAYOUT_MAX = 7;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 11; const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 11;
const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12;
const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15; const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15;
struct CB_RECALL_ANY4args { struct CB_RECALL_ANY4args {
uint32_t rca_objects_to_keep; uint32_t craa_objects_to_keep;
bitmap4 rca_type_mask; bitmap4 craa_type_mask;
}; };
19.6.3. RESULT 19.6.3. RESULT
struct CB_RECALL_ANY4res { struct CB_RECALL_ANY4res {
nfsstat4 rcr_status; nfsstat4 crar_status;
}; };
19.6.4. DESCRIPTION 19.6.4. DESCRIPTION
The server may decide that it cannot hold all of the state for The server may decide that it cannot hold all of the state for
recallable objects, such as delegations and layouts, without running recallable objects, such as delegations and layouts, without running
out of resources. In such a case, it is free to recall individual out of resources. In such a case, it is free to recall individual
objects to reduce the load but this would be far from optimal. objects to reduce the load but this would be far from optimal.
Because the general purpose of such recallable objects as delegations Because the general purpose of such recallable objects as delegations
skipping to change at page 451, line 46 skipping to change at page 451, line 46
TBD TBD
19.7.2. ARGUMENT 19.7.2. ARGUMENT
typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args; typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args;
19.7.3. RESULT 19.7.3. RESULT
struct CB_RECALLABLE_OBJ_AVAIL4res { struct CB_RECALLABLE_OBJ_AVAIL4res {
nfsstat4 roa_status; nfsstat4 croa_status;
}; };
19.7.4. DESCRIPTION 19.7.4. DESCRIPTION
CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client
that the server has resources to grant recallable objects that might that the server has resources to grant recallable objects that might
previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG, previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG,
or LAYOUTGET. or LAYOUTGET.
The argument, objects_to_keep means the total number of recallable The argument, objects_to_keep means the total number of recallable
skipping to change at page 455, line 33 skipping to change at page 455, line 33
19.10. Operation 12: CB_WANTS_CANCELLED 19.10. Operation 12: CB_WANTS_CANCELLED
19.10.1. SYNOPSIS 19.10.1. SYNOPSIS
fh, size -> - fh, size -> -
19.10.2. ARGUMENT 19.10.2. ARGUMENT
struct CB_WANTS_CANCELLED4args { struct CB_WANTS_CANCELLED4args {
bool wca_contended_wants_cancelled; bool cwca_contended_wants_cancelled;
bool wca_resourced_wants_cancelled; bool cwca_resourced_wants_cancelled;
}; };
19.10.3. RESULT 19.10.3. RESULT
struct CB_WANTS_CANCELLED4res { struct CB_WANTS_CANCELLED4res {
nfsstat4 wca_status; nfsstat4 cwcr_status;
}; };
19.10.4. DESCRIPTION 19.10.4. DESCRIPTION
The CB_WANTS_CANCELLED operation is used to notify the client that The CB_WANTS_CANCELLED operation is used to notify the client that
the some or all wants it registered for recallable delegations and the some or all wants it registered for recallable delegations and
layouts have been canceled. layouts have been canceled.
If wca_contended_wants_cancelled is TRUE, this indicates the server If cwca_contended_wants_cancelled is TRUE, this indicates the server
will not be pushing to the client any delegations that become will not be pushing to the client any delegations that become
available after contention passes. available after contention passes.
If wca_resourced_wants_cancelled is TRUE, this indicates the server If cwca_resourced_wants_cancelled is TRUE, this indicates the server
will not notify the client when there are resources on the server will not notify the client when there are resources on the server
grant delegations or layouts. grant delegations or layouts.
After receiving a CB_WANTS_CANCELLED operation, the client is free to After receiving a CB_WANTS_CANCELLED operation, the client is free to
attempt to acquire the delegations or layouts it was waiting for, and attempt to acquire the delegations or layouts it was waiting for, and
possibly re-register wants. possibly re-register wants.
19.10.5. IMPLEMENTATION 19.10.5. IMPLEMENTATION
19.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible lock 19.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible lock
availability availability
19.11.1. SYNOPSIS 19.11.1. SYNOPSIS
fh, lockowner -> () fh, lockowner -> ()
19.11.2. ARGUMENT 19.11.2. ARGUMENT
struct CB_NOTIFY_LOCK4args { struct CB_NOTIFY_LOCK4args {
nfs_fh4 file; lock_owner4 cnla_lock_owner;
lock_owner4 lock_owner; nfs_fh4 cnla_fh;
}; };
19.11.3. RESULT 19.11.3. RESULT
struct CB_NOTIFY_LOCK4res { struct CB_NOTIFY_LOCK4res {
nfsstat4 cnlr_status; nfsstat4 cnlr_status;
}; };
19.11.4. DESCRIPTION 19.11.4. DESCRIPTION
 End of changes. 24 change blocks. 
52 lines changed or deleted 54 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/