draft-ietf-nfsv4-sess-00.txt   draft-ietf-nfsv4-sess-01.txt 
INTERNET-DRAFT Tom Talpey INTERNET-DRAFT Tom Talpey
Expires: January 2005 Network Appliance, Inc. Expires: August 2005 Network Appliance, Inc.
Spencer Shepler Spencer Shepler
Sun Microsystems, Inc. Sun Microsystems, Inc.
Jon Bauman Jon Bauman
University of Michigan University of Michigan
July, 2004 February, 2005
NFSv4 Session Extensions NFSv4 Session Extensions
draft-ietf-nfsv4-sess-00 draft-ietf-nfsv4-sess-01
Status of this Memo Status of this Memo
By submitting this Internet-Draft, I certify that any applicable By submitting this Internet-Draft, I certify that any applicable
patent or other IPR claims of which I am aware have been disclosed, patent or other IPR claims of which I am aware have been disclosed,
or will be disclosed, and any of which I become aware will be or will be disclosed, and any of which I become aware will be
disclosed, in accordance with RFC 3668. disclosed, in accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 41 skipping to change at page 1, line 42
as reference material or to cite them other than as "work in as reference material or to cite them other than as "work in
progress." progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt The list of http://www.ietf.org/ietf/1id-abstracts.txt The list of
Internet-Draft Shadow Directories can be accessed at Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2004). All Rights Reserved. Copyright (C) The Internet Society (2005). All Rights Reserved.
Abstract Abstract
Extensions are proposed to NFS version 4 which enable it to support Extensions are proposed to NFS version 4 which enable it to support
long-lived sessions, endpoint management, and operation atop a long-lived sessions, endpoint management, and operation atop a
variety of RPC transports, including TCP and RDMA. These variety of RPC transports, including TCP and RDMA. These
extensions enable support for reliably implemented client response extensions enable support for reliably implemented client response
caching by NFSv4 servers, enhanced security, multipathing and caching by NFSv4 servers, enhanced security, multipathing and
trunking of transport connections. These extensions provide trunking of transport connections. These extensions provide
identical benefits over both TCP and RDMA connection types. identical benefits over both TCP and RDMA connection types.
skipping to change at page 2, line 28 skipping to change at page 2, line 28
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Problem Statement . . . . . . . . . . . . . . . . . . . 5 1.2. Problem Statement . . . . . . . . . . . . . . . . . . . 5
1.3. NFSv4 Session Extension Characteristics . . . . . . . . 7 1.3. NFSv4 Session Extension Characteristics . . . . . . . . 7
2. Transport Issues . . . . . . . . . . . . . . . . . . . . . 7 2. Transport Issues . . . . . . . . . . . . . . . . . . . . . 7
2.1. Session Model . . . . . . . . . . . . . . . . . . . . . 7 2.1. Session Model . . . . . . . . . . . . . . . . . . . . . 7
2.1.1. Connection State . . . . . . . . . . . . . . . . . . . 9 2.1.1. Connection State . . . . . . . . . . . . . . . . . . . 9
2.1.2. NFSv4 Channels, Sessions and Connections . . . . . . . 9 2.1.2. NFSv4 Channels, Sessions and Connections . . . . . . . 9
2.1.3. Reconnection, Trunking and Failover . . . . . . . . . 11 2.1.3. Reconnection, Trunking and Failover . . . . . . . . . 11
2.1.4. Server Duplicate Request Cache . . . . . . . . . . . . 12 2.1.4. Server Duplicate Request Cache . . . . . . . . . . . . 12
2.2. Session Initialization and Transfer Models . . . . . . . 13 2.2. Session Initialization and Transfer Models . . . . . . . 13
2.2.1. RDMA Requirements . . . . . . . . . . . . . . . . . . 13 2.2.1. Session Negotiation . . . . . . . . . . . . . . . . . 13
2.2.2. Session Negotiation . . . . . . . . . . . . . . . . . 14 2.2.2. RDMA Requirements . . . . . . . . . . . . . . . . . . 15
2.2.3. Connection Resources . . . . . . . . . . . . . . . . . 15 2.2.3. RDMA Connection Resources . . . . . . . . . . . . . . 15
2.2.4. Inline Transfer Model . . . . . . . . . . . . . . . . 16 2.2.4. TCP and RDMA Inline Transfer Model . . . . . . . . . . 16
2.2.5. Direct Transfer Model . . . . . . . . . . . . . . . . 19 2.2.5. RDMA Direct Transfer Model . . . . . . . . . . . . . . 19
2.3. Connection Models . . . . . . . . . . . . . . . . . . . 21 2.3. Connection Models . . . . . . . . . . . . . . . . . . . 22
2.3.1. TCP Connection Model . . . . . . . . . . . . . . . . . 23 2.3.1. TCP Connection Model . . . . . . . . . . . . . . . . . 23
2.3.2. Negotiated RDMA Connection Model . . . . . . . . . . . 23 2.3.2. Negotiated RDMA Connection Model . . . . . . . . . . . 24
2.3.3. Automatic RDMA Connection Model . . . . . . . . . . . 24 2.3.3. Automatic RDMA Connection Model . . . . . . . . . . . 24
2.4. Buffer Management, Transfer, Flow Control . . . . . . . 25 2.4. Buffer Management, Transfer, Flow Control . . . . . . . 25
2.5. Retry and Replay . . . . . . . . . . . . . . . . . . . . 28 2.5. Retry and Replay . . . . . . . . . . . . . . . . . . . . 28
2.6. The Back Channel . . . . . . . . . . . . . . . . . . . . 28 2.6. The Back Channel . . . . . . . . . . . . . . . . . . . . 28
2.7. COMPOUND Sizing Issues . . . . . . . . . . . . . . . . . 30 2.7. COMPOUND Sizing Issues . . . . . . . . . . . . . . . . . 30
2.8. Data Alignment . . . . . . . . . . . . . . . . . . . . . 30 2.8. Data Alignment . . . . . . . . . . . . . . . . . . . . . 30
3. NFSv4 Integration . . . . . . . . . . . . . . . . . . . . 31 3. NFSv4 Integration . . . . . . . . . . . . . . . . . . . . 31
3.1. Minor Versioning . . . . . . . . . . . . . . . . . . . . 32 3.1. Minor Versioning . . . . . . . . . . . . . . . . . . . . 32
3.2. Slot Identifiers and Server Duplicate Request Cache . . 32 3.2. Slot Identifiers and Server Duplicate Request Cache . . 32
3.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 35 3.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 35
3.4. eXternal Data Representation Efficiency . . . . . . . . 36 3.4. eXternal Data Representation Efficiency . . . . . . . . 36
3.5. Effect of Sessions on Existing Operations . . . . . . . 36 3.5. Effect of Sessions on Existing Operations . . . . . . . 36
3.6. Authentication Efficiencies . . . . . . . . . . . . . . 37 3.6. Authentication Efficiencies . . . . . . . . . . . . . . 37
4. Security Considerations . . . . . . . . . . . . . . . . . 38 4. Security Considerations . . . . . . . . . . . . . . . . . 38
5. IANA Considerations . . . . . . . . . . . . . . . . . . . 39 4.1. Authentication . . . . . . . . . . . . . . . . . . . . . 40
6. NFSv4 Protocol Extensions . . . . . . . . . . . . . . . . 40 5. IANA Considerations . . . . . . . . . . . . . . . . . . . 41
6.1. Operation: CREATECLIENTID . . . . . . . . . . . . . . . 40 6. NFSv4 Protocol Extensions . . . . . . . . . . . . . . . . 41
6.2. Operation: CREATE_SESSION . . . . . . . . . . . . . . . 45 6.1. Operation: CREATECLIENTID . . . . . . . . . . . . . . . 41
6.3. Operation: BIND_BACKCHANNEL . . . . . . . . . . . . . . 50 6.2. Operation: CREATESESSION . . . . . . . . . . . . . . . . 46
6.4. Operation: DESTROYSESSION . . . . . . . . . . . . . . . 52 6.3. Operation: BIND_BACKCHANNEL . . . . . . . . . . . . . . 51
6.5. Operation: SEQUENCE . . . . . . . . . . . . . . . . . . 53 6.4. Operation: DESTROYSESSION . . . . . . . . . . . . . . . 53
6.6. Callback operation: CB_RECALLCREDIT . . . . . . . . . . 55 6.5. Operation: SEQUENCE . . . . . . . . . . . . . . . . . . 54
7. NFSv4 Session Protocol Description . . . . . . . . . . . . 55 6.6. Callback operation: CB_RECALLCREDIT . . . . . . . . . . 56
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 62 6.7. Callback operation: CB_SEQUENCE . . . . . . . . . . . . 56
9. References . . . . . . . . . . . . . . . . . . . . . . . . 62 7. NFSv4 Session Protocol Description . . . . . . . . . . . . 58
9.1. Normative References . . . . . . . . . . . . . . . . . . 62 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 64
9.2. Informative References . . . . . . . . . . . . . . . . . 62 9. References . . . . . . . . . . . . . . . . . . . . . . . . 64
10. Authors' Addresses . . . . . . . . . . . . . . . . . . . . 64 9.1. Normative References . . . . . . . . . . . . . . . . . . 64
11. Full Copyright Statement . . . . . . . . . . . . . . . . . 65 9.2. Informative References . . . . . . . . . . . . . . . . . 65
10. Authors' Addresses . . . . . . . . . . . . . . . . . . . . 67
11. Full Copyright Statement . . . . . . . . . . . . . . . . . 67
1. Introduction 1. Introduction
This draft proposes extensions to NFS version 4 enabling it to This draft proposes extensions to NFS version 4 [RFC3530] enabling
support sessions and endpoint management, and to support operation it to support sessions and endpoint management, and to support
atop RDMA-capable RPC over transports such as iWARP. [RDMAP, DDP] operation atop RDMA-capable RPC over transports such as iWARP.
These extensions enable support for exactly-once semantics by NFSv4 [RDMAP, DDP] These extensions enable support for exactly-once
servers, multipathing and trunking of transport connections, and semantics by NFSv4 servers, multipathing and trunking of transport
enhanced security. The ability to operate over RDMA enables connections, and enhanced security. The ability to operate over
greatly enhanced performance. Operation over existing TCP is RDMA enables greatly enhanced performance. Operation over existing
enhanced as well. TCP is enhanced as well.
While discussed here with respect to IETF-chartered transports, the While discussed here with respect to IETF-chartered transports, the
proposed protocol is intended to function over other standards, proposed protocol is intended to function over other standards,
such as Infiniband. [IB] such as Infiniband. [IB]
The following are the major aspects of this proposal: The following are the major aspects of this proposal:
o Changes are proposed within the framework of NFSv4 minor o Changes are proposed within the framework of NFSv4 minor
versioning. RPC, XDR, and the NFSv4 procedures and operations versioning. RPC, XDR, and the NFSv4 procedures and operations
are preserved. The proposed extension functions equally well are preserved. The proposed extension functions equally well
over existing transports and RDMA, and interoperates over existing transports and RDMA, and interoperates
transparently with existing implementations, both at the local transparently with existing implementations, both at the local
programmatic interface and over the wire. programmatic interface and over the wire.
o An explicit session is introduced to NFSv4, and six new o An explicit session is introduced to NFSv4, and new operations
operations are added to support it. The session allows for are added to support it. The session allows for enhanced
enhanced trunking, failover and recovery, and authentication trunking, failover and recovery, and authentication
efficiency, along with necessary support for RDMA. The efficiency, along with necessary support for RDMA. The
session is implemented as operations within NFSv4 COMPOUND and session is implemented as operations within NFSv4 COMPOUND and
does not impact layering or interoperability with existing does not impact layering or interoperability with existing
NFSv4 implementations. The NFSv4 callback channel is NFSv4 implementations. The NFSv4 callback channel is
dynamically associated and is connected by the client and not dynamically associated and is connected by the client and not
the server, enhancing security and operation through the server, enhancing security and operation through
firewalls. In fact, the callback channel will be enabled to firewalls. In fact, the callback channel will be enabled to
share the same connection as the operations channel. share the same connection as the operations channel.
o An enhanced RPC layer enables NFSv4 operation atop RDMA. The o An enhanced RPC layer enables NFSv4 operation atop RDMA. The
skipping to change at page 9, line 17 skipping to change at page 9, line 20
In RFC3530, the combination of a connected transport endpoint and a In RFC3530, the combination of a connected transport endpoint and a
clientid forms the basis of connection state. While has been made clientid forms the basis of connection state. While has been made
to be workable with certain limitations, there are difficulties in to be workable with certain limitations, there are difficulties in
correct and robust implementation. The NFSv4.0 protocol must correct and robust implementation. The NFSv4.0 protocol must
provide a server-initiated connection for the callback channel, and provide a server-initiated connection for the callback channel, and
must carefully specify the persistence of client state at the must carefully specify the persistence of client state at the
server in the face of transport interruptions. The server has only server in the face of transport interruptions. The server has only
the client's transport address binding (the IP 4-tuple) to identify the client's transport address binding (the IP 4-tuple) to identify
the client RPC transaction stream and to use as a lookup tag on the the client RPC transaction stream and to use as a lookup tag on the
duplicate request cache. (A useful overview of this is in [RW96].) duplicate request cache. (A useful overview of this is in [RW96].)
If the server listens on multiple adddresses, and the client
connects to more than one, it must employ different clientid's on
each, negating its ability to aggregate bandwidth and redundancy.
In effect, each transport connection is used as the server's In effect, each transport connection is used as the server's
representation of client state. But, transport connections are representation of client state. But, transport connections are
potentially fragile and transitory. potentially fragile and transitory.
In this proposal, a session identifier is assigned by the server In this proposal, a session identifier is assigned by the server
upon initial session negotiation on each connection. This upon initial session negotiation on each connection. This
identifier is used to associate additional connections, to identifier is used to associate additional connections, to
renegotiate after a reconnect, to provide an abstraction for the renegotiate after a reconnect, to provide an abstraction for the
various session properties, and to address the duplicate request various session properties, and to address the duplicate request
cache. No transport-specific information is used in the duplicate cache. No transport-specific information is used in the duplicate
skipping to change at page 10, line 35 skipping to change at page 10, line 43
example, reads and writes may be assigned to specific, optimized example, reads and writes may be assigned to specific, optimized
connections, or sorted and separated by any or all of size, connections, or sorted and separated by any or all of size,
idempotency, etc. idempotency, etc.
To address the problems described above, this proposal allows To address the problems described above, this proposal allows
multiple sessions to share a clientid, as well as for multiple multiple sessions to share a clientid, as well as for multiple
connections to share a session. connections to share a session.
Single Connection model: Single Connection model:
NFSv4.1 client instance NFSv4.1 Session
|
Session
/ \ / \
Operations_Channel [Back_Channel] Operations_Channel [Back_Channel]
\ / \ /
Connection Connection
| |
Multi-connection trunked model (2 operations channels shown): Multi-connection trunked model (2 operations channels shown):
NFSv4.1 client instance NFSv4.1 Session
|
Session
/ \ / \
Operations_Channels [Back_Channel] Operations_Channels [Back_Channel]
| | | | | |
Connection Connection [Connection] Connection Connection [Connection]
| | | | | |
Multi-connection split-use model (2 mounts shown): Multi-connection split-use model (2 mounts shown):
NFSv4.1 client instance NFSv4.1 Session
/ \ / \
Session Session
(/home) (/usr/local - readonly) (/home) (/usr/local - readonly)
/ \ | / \ |
Operations_Channel [Back_Channel] | Operations_Channel [Back_Channel] |
| | Operations_Channel | | Operations_Channel
Connection [Connection] | Connection [Connection] |
| | Connection | | Connection
| |
In this way, implementation as well as resource management may be In this way, implementation as well as resource management may be
optimized. Each session will have its own response caching and optimized. Each session will have its own response caching and
skipping to change at page 13, line 43 skipping to change at page 13, line 41
enabled for a given session, the session reply must inform the enabled for a given session, the session reply must inform the
client if the mode is in fact enabled. In this way the client can client if the mode is in fact enabled. In this way the client can
confidently proceed with operations without having to implement confidently proceed with operations without having to implement
consistency facilities of its own. consistency facilities of its own.
2.2. Session Initialization and Transfer Models 2.2. Session Initialization and Transfer Models
Session initialization issues, and data transfer models relevant to Session initialization issues, and data transfer models relevant to
both TCP and RDMA are discussed in this section. both TCP and RDMA are discussed in this section.
2.2.1. RDMA Requirements 2.2.1. Session Negotiation
A complete discussion of the operation of RPC-based protocols atop
RDMA transports is in [RPCRDMA], and a general discussion of NFS
RDMA requirements is in [RDMAREQ]. Where RDMA is considered, this
proposal assumes the use of such a layering; it addresses only the
upper layer issues relevant to making best use of RPC/RDMA.
A connection oriented (reliable sequenced) RDMA transport will be
required. There are several reasons for this. First, this model
most closely reflects the general NFSv4 requirement of long-lived
and congestion-controlled transports. Second, to operate correctly
over either an unreliable or unsequenced RDMA transport, or both,
would require significant complexity in the implementation and
protocol not appropriate for a strict minor version. For example,
retransmission on connected endpoints is explicitly disallowed in
the current NFSv4 draft; it would again be required with these
alternate transport characteristics. Third, the proposal assumes a
specific RDMA ordering semantic, which presents the same set of
ordering and reliability issues to the RDMA layer over such
transports.
The RDMA implementation provides for making connections to other
RDMA-capable peers. In the case of the current proposals before
the RDDP working group, these RDMA connections are preceded by a
"streaming" phase, where ordinary TCP (or NFS) traffic might flow.
However, this is not assumed here and sizes and other parameters
are explicitly exchanged upon a session entering RDMA mode.
2.2.2. Session Negotiation
Some of the parameters to be exchanged at session creation time are The following parameters are exchanged between client and server at
as follows. session creation time. Their values allow the server to properly
size resources allocated in order to service the client's requests,
and to provide the server with a way to communicate limits to the
client for proper and optimal operation. They are exchanged prior
to all session-related activity, over any transport type.
Discussion of their use is found in their descriptions as well as
throughout this section.
Maximum Requests Maximum Requests
The client's desired maximum number of concurrent requests is The client's desired maximum number of concurrent requests is
passed, in order to allow the server to size its reply cache passed, in order to allow the server to size its reply cache
storage. The server may modify the client's requested limit storage. The server may modify the client's requested limit
downward (or upward) to match its local policy and/or downward (or upward) to match its local policy and/or
resources. Over RDMA-capable RPC transports, the per-request resources. Over RDMA-capable RPC transports, the per-request
management of low-level transport message credits is handled management of low-level transport message credits is handled
within the RPC layer. [RPCRDMA] within the RPC layer. [RPCRDMA]
skipping to change at page 15, line 15 skipping to change at page 14, line 38
Inline Padding/Alignment Inline Padding/Alignment
The server can inform the client of any padding which can be The server can inform the client of any padding which can be
used to deliver NFSv4 inline WRITE payloads into aligned used to deliver NFSv4 inline WRITE payloads into aligned
buffers. Such alignment can be used to avoid data copy buffers. Such alignment can be used to avoid data copy
operations at the server for both TCP and inline RDMA operations at the server for both TCP and inline RDMA
transfers. For RDMA, the client informs the server in each transfers. For RDMA, the client informs the server in each
operation when padding has been applied. [RPCRDMA] operation when padding has been applied. [RPCRDMA]
Transport Attributes Transport Attributes
A placeholder for transport-specific attributes is provided, A placeholder for transport-specific attributes is provided,
with a format to be determined. Examples of information to be with a format to be determined. Possible examples of
passed in this parameter include transport security attributes information to be passed in this parameter include transport
to be used on the connection, RDMA-specific attributes, legacy security attributes to be used on the connection, RDMA-
"private data" as used on existing RDMA fabrics, transport specific attributes, legacy "private data" as used on existing
Quality of Service attributes, etc. This information is to be RDMA fabrics, transport Quality of Service attributes, etc.
passed to the peer's transport layer by local means which is This information is to be passed to the peer's transport layer
currently outside the scope of this draft, however one by local means which is currently outside the scope of this
attribute is provided in the RDMA case: draft, however one attribute is provided in the RDMA case:
RDMA Read Resources RDMA Read Resources
RDMA implementations must explicitly provision resources to RDMA implementations must explicitly provision resources
support RDMA Read requests from connected peers. These values to support RDMA Read requests from connected peers.
must be explicitly specified, to provide adequate resources These values must be explicitly specified, to provide
for matching the peer's expected needs and the connection's adequate resources for matching the peer's expected needs
delay-bandwidth parameters. The client provides its chosen and the connection's delay-bandwidth parameters. The
value to the server in the initial session creation, the value client provides its chosen value to the server in the
must be provided in each client RDMA endpoint. The values are initial session creation, the value must be provided in
asymmetric and should be set to zero at the server in order to each client RDMA endpoint. The values are asymmetric and
conserve RDMA resources, since clients do not issue RDMA Read should be set to zero at the server in order to conserve
operations in this proposal. The result is communicated in RDMA resources, since clients do not issue RDMA Read
the session response, to permit matching of values across the operations in this proposal. The result is communicated
connection. The value may not be changed in the duration of in the session response, to permit matching of values
the session, although a new value may be requested as part of across the connection. The value may not be changed in
a new session. the duration of the session, although a new value may be
requested as part of a new session.
2.2.3. Connection Resources 2.2.2. RDMA Requirements
A complete discussion of the operation of RPC-based protocols atop
RDMA transports is in [RPCRDMA]. Where RDMA is considered, this
proposal assumes the use of such a layering; it addresses only the
upper layer issues relevant to making best use of RPC/RDMA.
A connection oriented (reliable sequenced) RDMA transport will be
required. There are several reasons for this. First, this model
most closely reflects the general NFSv4 requirement of long-lived
and congestion-controlled transports. Second, to operate correctly
over either an unreliable or unsequenced RDMA transport, or both,
would require significant complexity in the implementation and
protocol not appropriate for a strict minor version. For example,
retransmission on connected endpoints is explicitly disallowed in
the current NFSv4 draft; it would again be required with these
alternate transport characteristics. Third, the proposal assumes a
specific RDMA ordering semantic, which presents the same set of
ordering and reliability issues to the RDMA layer over such
transports.
The RDMA implementation provides for making connections to other
RDMA-capable peers. In the case of the current proposals before
the RDDP working group, these RDMA connections are preceded by a
"streaming" phase, where ordinary TCP (or NFS) traffic might flow.
However, this is not assumed here and sizes and other parameters
are explicitly exchanged upon a session entering RDMA mode.
2.2.3. RDMA Connection Resources
On transport endpoints which support automatic RDMA mode, that is, On transport endpoints which support automatic RDMA mode, that is,
endpoints which are created in the RDMA-enabled state, a single, endpoints which are created in the RDMA-enabled state, a single,
preposted buffer must initially be provided by both peers, and the preposted buffer must initially be provided by both peers, and the
client session negotiation must be the first exchange. client session negotiation must be the first exchange.
On transport endpoints supporting dynamic negotiation, a more On transport endpoints supporting dynamic negotiation, a more
sophisticated negotiation is possible, but is not discussed in the sophisticated negotiation is possible, but is not discussed in the
current draft. current draft.
skipping to change at page 16, line 25 skipping to change at page 16, line 29
the RPC layer to handle receives. These buffers remain in use by the RPC layer to handle receives. These buffers remain in use by
the RPC/NFSv4 implementation; the size and number of them must be the RPC/NFSv4 implementation; the size and number of them must be
known to the remote peer in order to avoid RDMA errors which would known to the remote peer in order to avoid RDMA errors which would
cause a fatal error on the RDMA connection. cause a fatal error on the RDMA connection.
The session provides a natural way for the server to manage The session provides a natural way for the server to manage
resource allocation to each client rather than to each transport resource allocation to each client rather than to each transport
connection itself. This enables considerable flexibility in the connection itself. This enables considerable flexibility in the
administration of transport endpoints. administration of transport endpoints.
2.2.4. Inline Transfer Model 2.2.4. TCP and RDMA Inline Transfer Model
The basic transfer model for both TCP and RDMA is referred to as The basic transfer model for both TCP and RDMA is referred to as
"inline". For TCP, this is the only transfer model supported, "inline". For TCP, this is the only transfer model supported,
since TCP carries both the RPC header and data together in the data since TCP carries both the RPC header and data together in the data
stream. stream.
For RDMA, the RDMA Send transfer model is used for all NFS requests For RDMA, the RDMA Send transfer model is used for all NFS requests
and replies, but data is optionally carried by RDMA Writes or RDMA and replies, but data is optionally carried by RDMA Writes or RDMA
Reads. Use of Sends is required to ensure consistency of data and Reads. Use of Sends is required to ensure consistency of data and
to deliver completion notifications. The pure-Send method is to deliver completion notifications. The pure-Send method is
skipping to change at page 17, line 21 skipping to change at page 17, line 31
buffer : : buffer : :
Client Server Client Server
: Write request with data : : Write request with data :
Send : ------------------------------> : untagged Send : ------------------------------> : untagged
: : buffer : : buffer
: Write response : : Write response :
untagged : <------------------------------ : Send untagged : <------------------------------ : Send
buffer : : buffer : :
Responses must be sent to the client on the same RDMA connection Responses must be sent to the client on the same connection that
that the request was sent. This is important to preserve ordering the request was sent. It is important that the server does not
of operations, and especially RMDA consistency. Additionally, it assume any specific client implementation, in particular whether
ensures that the RPC RDMA layer makes no requirement of the RDMA connections within a session share any state at the client. This
provider to open its memory registration handles (Steering Tags) is also important to preserve ordering of RDMA operations, and
beyond the scope of a single RDMA connection. This is an important especially RMDA consistency. Additionally, it ensures that the RPC
security consideration. RDMA layer makes no requirement of the RDMA provider to open its
memory registration handles (Steering Tags) beyond the scope of a
single RDMA connection. This is an important security
consideration.
Two values must be known to each peer prior to issuing Sends: the Two values must be known to each peer prior to issuing Sends: the
maximum number of sends which may be posted, and their maximum maximum number of sends which may be posted, and their maximum
size. These values are referred to, respectively, as the message size. These values are referred to, respectively, as the message
credits and the maximum message size. While the message credits credits and the maximum message size. While the message credits
might vary dynamically over the duration of the session, the might vary dynamically over the duration of the session, the
maximum message size does not. The server must commit to posting a maximum message size does not. The server must commit to
number of receive buffers equal to or greater than its currently preserving this number of duplicate request cache entires, and
advertised credit value, each of the advertised size. If fewer preparing a number of receive buffers equal to or greater than its
credits or smaller buffers are provided, the connection may fail currently advertised credit value, each of the advertised size.
with an RDMA transport error. These ensure that transport resources are allocated sufficient to
receive the full advertised limits.
Note that the server must post the maximum number of session Note that the server must post the maximum number of session
requests to each client operations channel. It is not possible for requests to each client operations channel. The client is not
the client to spread its requests in any particular fashion across required to spread its requests in any particular fashion across
connections within a session. Instead, the client may create connections within a session. If the client wishes, it may create
multiple sessions, each with a single or small number of operations multiple sessions, each with a single or small number of operations
channels to provide the server with this resource advantage. Or, channels to provide the server with this resource advantage. Or,
the server may employ a "shared receive queue". The server can in over RDMA the server may employ a "shared receive queue". The
any case protect its resources by restricting the client's request server can in any case protect its resources by restricting the
credits. client's request credits.
While tempting to consider, it is not possible to use the TCP While tempting to consider, it is not possible to use the TCP
window as an RDMA operation flow control mechanism. First, to do window as an RDMA operation flow control mechanism. First, to do
so would violate layering, requiring both senders to be aware of so would violate layering, requiring both senders to be aware of
the existing TCP outbound window at all times. Second, since the existing TCP outbound window at all times. Second, since
requests are of variable size, the TCP window can hold a widely requests are of variable size, the TCP window can hold a widely
variable number of them, and since it cannot be reduced without variable number of them, and since it cannot be reduced without
actually receiving data, the receiver cannot limit the sender. actually receiving data, the receiver cannot limit the sender.
Third, any middlebox interposing on the connection would wreck any Third, any middlebox interposing on the connection would wreck any
possible scheme. [MIDTAX] In this proposal, maximum request count possible scheme. [MIDTAX] In this proposal, maximum request count
limits are exchanged at the session level to allow correct limits are exchanged at the session level to allow correct
provisioning of receive buffers by transports. provisioning of receive buffers by transports.
When not operating over RDMA, request limits and sizes are still When operating over TCP or other similar transport, request limits
employed in NFSv4.1, but instead of being required for correctness, and sizes are still employed in NFSv4.1, but instead of being
they provide the basis for efficient server implementation of the required for correctness, they provide the basis for efficient
duplicate request cache. The limits are chosen based upon the server implementation of the duplicate request cache. The limits
expected needs and capabilities of the client and server, and are are chosen based upon the expected needs and capabilities of the
in fact arbitrary. Sizes may be specified by the client as zero client and server, and are in fact arbitrary. Sizes may be
(requesting the server's preferred or optimal value), and request specified by the client as zero (requesting the server's preferred
limits may be chosen in proportion to the client's capabilities. or optimal value), and request limits may be chosen in proportion
For example, a limit of 1000 allows 1000 requests to be in to the client's capabilities. For example, a limit of 1000 allows
progress, which may generally be far more than adequate to keep 1000 requests to be in progress, which may generally be far more
local networks and servers fully utilized. than adequate to keep local networks and servers fully utilized.
Both client and server have independent sizes and buffering, but Both client and server have independent sizes and buffering, but
over RDMA fabrics client credits are easily managed by posting a over RDMA fabrics client credits are easily managed by posting a
receive buffer prior to sending each request. Each such buffer may receive buffer prior to sending each request. Each such buffer may
not be completed with the corresponding reply, since responses from not be completed with the corresponding reply, since responses from
NFSv4 servers arrive in arbitrary order. When an operations NFSv4 servers arrive in arbitrary order. When an operations
channel is also used for callbacks, the client must account for channel is also used for callbacks, the client must account for
callback requests by posting additional buffers. Note that callback requests by posting additional buffers. Note that
implementation-specific facilities such as a shared receive queue implementation-specific facilities such as a shared receive queue
may also allow optimization of these allocations. may also allow optimization of these allocations.
skipping to change at page 19, line 16 skipping to change at page 19, line 29
procedures. Since an arbitrary number (total size) of operations procedures. Since an arbitrary number (total size) of operations
can be specified in a single COMPOUND procedure, its size is can be specified in a single COMPOUND procedure, its size is
effectively unbounded. This cannot be supported by RDMA Sends, and effectively unbounded. This cannot be supported by RDMA Sends, and
therefore this size negotiation places a restriction on the therefore this size negotiation places a restriction on the
construction and maximum size of both COMPOUND requests and construction and maximum size of both COMPOUND requests and
responses. If a COMPOUND results in a reply at the server that is responses. If a COMPOUND results in a reply at the server that is
larger than can be sent in an RDMA Send to the client, then the larger than can be sent in an RDMA Send to the client, then the
COMPOUND must terminate and the operation which causes the overflow COMPOUND must terminate and the operation which causes the overflow
will provide a TOOSMALL error status result. will provide a TOOSMALL error status result.
2.2.5. Direct Transfer Model 2.2.5. RDMA Direct Transfer Model
Placement of data by explicitly tagged RDMA operations is referred Placement of data by explicitly tagged RDMA operations is referred
to as "direct" transfer. This method is typically used where the to as "direct" transfer. This method is typically used where the
data payload is relatively large, that is, when RDMA setup has been data payload is relatively large, that is, when RDMA setup has been
performed prior to the operation, or when any overhead for setting performed prior to the operation, or when any overhead for setting
up and performing the transfer is regained by avoiding the overhead up and performing the transfer is regained by avoiding the overhead
of processing an ordinary receive. of processing an ordinary receive.
The client advertises RDMA buffers in this proposed model, and not The client advertises RDMA buffers in this proposed model, and not
the server. This means the "XDR Decoding with Read Chunks" the server. This means the "XDR Decoding with Read Chunks"
skipping to change at page 32, line 25 skipping to change at page 32, line 25
that can be proposed when considering extensions. that can be proposed when considering extensions.
To support the duplicate request cache integrated with sessions and To support the duplicate request cache integrated with sessions and
request control, it is desirable to tag each request with an request control, it is desirable to tag each request with an
identifier to be called a Slotid. This identifier must be passed identifier to be called a Slotid. This identifier must be passed
by NFSv4 when running atop any transport, including traditional by NFSv4 when running atop any transport, including traditional
TCP. Therefore it is not desirable to add the Slotid to a new RPC TCP. Therefore it is not desirable to add the Slotid to a new RPC
transport, even though such a transport is indicated for support of transport, even though such a transport is indicated for support of
RDMA. This draft and [RPCRDMA] do not propose such an approach. RDMA. This draft and [RPCRDMA] do not propose such an approach.
Instead, this proposal confirms to the requirements of NFSv4 minor Instead, this proposal conforms to the requirements of NFSv4 minor
versioning, through the use of a new operation within NFSv4 versioning, through the use of a new operation within NFSv4
COMPOUND procedures as detailed below. COMPOUND procedures as detailed below.
If sessions are in use for a given clientid, this same clientid
cannot be used for non-session NFSv4 operation, including NFSv4.0.
Because the server will have allocated session-specific state to
the active clientid, it would be an unnecessary burden on the
server implementor to support and account for additional, non-
session traffic, in addition to being of no benefit. Therefore
this proposal prohibits a single clientid from doing this.
Nevertheless, employing a new clientid for such traffic is
supported.
3.2. Slot Identifiers and Server Duplicate Request Cache 3.2. Slot Identifiers and Server Duplicate Request Cache
The presence of deterministic maximum request limits on a session The presence of deterministic maximum request limits on a session
enables in-progress requests to be assigned unique values with enables in-progress requests to be assigned unique values with
useful properties. useful properties.
The RPC layer provides a transaction ID (xid), which, while The RPC layer provides a transaction ID (xid), which, while
required to be unique, is not especially convenient for tracking required to be unique, is not especially convenient for tracking
requests. The transaction ID is only meaningful to the issuer requests. The transaction ID is only meaningful to the issuer
(client), it cannot be interpreted at the server except to test for (client), it cannot be interpreted at the server except to test for
skipping to change at page 34, line 40 skipping to change at page 34, line 51
granted maximum request count to the client, it may not be able to granted maximum request count to the client, it may not be able to
use receipt of the slotid to retire cache entries. The slotid used use receipt of the slotid to retire cache entries. The slotid used
in an incoming request may not reflect the server's current idea of in an incoming request may not reflect the server's current idea of
the client's session limit, because the request may have been sent the client's session limit, because the request may have been sent
from the client before the update was received. Therefore, in the from the client before the update was received. Therefore, in the
downward adjustment case, the server may have to retain a number of downward adjustment case, the server may have to retain a number of
duplicate request cache entries at least as large as the old value, duplicate request cache entries at least as large as the old value,
until operation sequencing rules allow it to infer that the client until operation sequencing rules allow it to infer that the client
has seen its reply. has seen its reply.
The SEQUENCE operation also carries a "maxslot" value which carries The SEQUENCE (and CB_SEQUENCE) operation also carries a "maxslot"
additional client slot usage information. The client must always value which carries additional client slot usage information. The
provide its highest-numbered outstanding slot value in the maxslot client must always provide its highest-numbered outstanding slot
argument, and the server may reply with a new recognized value. value in the maxslot argument, and the server may reply with a new
The client should in all cases provide the most conservative value recognized value. The client should in all cases provide the most
possible, although it can be increased somewhat above the actual conservative value possible, although it can be increased somewhat
instantaneous usage to maintain some minimum or optimal level. above the actual instantaneous usage to maintain some minimum or
This provides a way for the client to yield unused request slots optimal level. This provides a way for the client to yield unused
back to the server, which in turn can use the information to request slots back to the server, which in turn can use the
reallocate resources. Obviously, maxslot can never be zero, or the information to reallocate resources. Obviously, maxslot can never
session would deadlock. be zero, or the session would deadlock.
The server also provides a target maxslot value to the client, The server also provides a target maxslot value to the client,
which is an indication to the client of the maxslot the server which is an indication to the client of the maxslot the server
wishes the client to be using. This permits the server to withdraw wishes the client to be using. This permits the server to withdraw
(or add) resources from a client that has been found to not be (or add) resources from a client that has been found to not be
using them, in order to more fairly share resources among a varying using them, in order to more fairly share resources among a varying
level of demand from other clients. The client must always comply level of demand from other clients. The client must always comply
with the server's value updates, since they indicate newly with the server's value updates, since they indicate newly
established hard limits on the client's access to session established hard limits on the client's access to session
resources. However, because of request pipelining, the client may resources. However, because of request pipelining, the client may
skipping to change at page 36, line 42 skipping to change at page 36, line 42
recursive calls, etc). Often, such conversions are carried out recursive calls, etc). Often, such conversions are carried out
even when no size or byte order conversion is necessary. even when no size or byte order conversion is necessary.
It is recommended that implementations pay close attention to the It is recommended that implementations pay close attention to the
details of memory referencing in such code. It is far more details of memory referencing in such code. It is far more
efficient to inspect data in place, using native facilities to deal efficient to inspect data in place, using native facilities to deal
with word size and byte order conversion into registers or local with word size and byte order conversion into registers or local
variables, rather than formally (and blindly) performing the variables, rather than formally (and blindly) performing the
operation via fetch, reallocate and store. operation via fetch, reallocate and store.
Of particular concern is the result of the READDIR_DIRECT Of particular concern is the result of the READDIR operation, in
operation, in which such encoding abounds. which such encoding abounds.
3.5. Effect of Sessions on Existing Operations 3.5. Effect of Sessions on Existing Operations
The use of a session replaces the use of the SETCLIENTID and The use of a session replaces the use of the SETCLIENTID and
SETCLIENTID_CONFIRM operations, and allows certain simplification SETCLIENTID_CONFIRM operations, and allows certain simplification
of the RENEW and callback addressing mechanisms in the base of the RENEW and callback addressing mechanisms in the base
protocol. protocol.
The cb_program and cb_location which are obtained by the server in The cb_program and cb_location which are obtained by the server in
SETCLIENTID_CONFIRM must not be used by the server, because the SETCLIENTID_CONFIRM must not be used by the server, because the
NFSv4.1 client performs callback channel designation with NFSv4.1 client performs callback channel designation with
BIND_BACKCHANNEL. Therefore the SETCLIENTID and BIND_BACKCHANNEL. Therefore the SETCLIENTID and
SETCLIENTID_CONFIRM operations becomes obsolete when sessions are SETCLIENTID_CONFIRM operations becomes obsolete when sessions are
in use, and a server should return an error to NFSv4.1 clients in use, and a server should return an error to NFSv4.1 clients
which might issue either operation. which might issue either operation.
Since the session carries the client indication with it implicitly, Another favorable result of the session is that the server is able
any request on a session associated with a given client will renew to avoid requiring the client to perform OPEN_CONFIRM operations.
that client's leases. Therefore the RENEW operation is made The existence of a reliable and effective DRC means that the server
unnecessary when a session is present, as any request (e.g. a will be able to determine whether an OPEN request carrying a
SEQUENCE operation with or without and additional NFSv4 operations) previously known open_owner from a client is or is not a
performs its function. It is possible (though this proposal does retransmission. Because of this, the server no longer requires
not make any recommendation) that the RENEW operation could be made OPEN_CONFIRM to verify whether the client is retransmitting an open
obsolete. request. This in turn eliminates the server's reason for
requesting OPEN_CONFIRM - the server can simply replace any
previous information on this open_owner. Client OPEN operations
are therefore streamlined, reducing overhead and latency through
avoiding the additional OPEN_CONFIRM exchange.
Since the session carries the client liveness indication with it
implicitly, any request on a session associated with a given client
will renew that client's leases. Therefore the RENEW operation is
made unnecessary when a session is present, as any request
(including a SEQUENCE operation with or without additional NFSv4
operations) performs its function. It is possible (though this
proposal does not make any recommendation) that the RENEW operation
could be made obsolete.
An interesting issue arises however if an error occurs on such a
SEQUENCE operation. If the SEQUENCE operation fails, perhaps due
to an invalid slotid or other non-renewal-based issue, the server
may or may not have performed the RENEW. In this case, the state
of any renewal is undefined, and the client should make no
assumption that it has been performed. In practice, this should
not occur but even if it did, it is expected the client would
perform some sort of recovery which would result in a new,
successful, SEQUENCE operation being run and the client assured
that the renewal took place.
3.6. Authentication Efficiencies 3.6. Authentication Efficiencies
NFSv4 requires the use of the RPCSEC_GSS ONC RPC security flavor NFSv4 requires the use of the RPCSEC_GSS ONC RPC security flavor
[RFC2203] to provide authentication, integrity, and privacy via [RFC2203] to provide authentication, integrity, and privacy via
cryptography. The server dictates to the client the use of cryptography. The server dictates to the client the use of
RPCSEC_GSS, the service (authentication, integrity, or privacy), RPCSEC_GSS, the service (authentication, integrity, or privacy),
and the specific GSS-API security mechanism that each remote and the specific GSS-API security mechanism that each remote
procedure call and result will use. procedure call and result will use.
skipping to change at page 39, line 21 skipping to change at page 39, line 47
If the NFS client wishes to maintain full control over RPCSEC_GSS If the NFS client wishes to maintain full control over RPCSEC_GSS
protection, it may still perform its transfer operations using protection, it may still perform its transfer operations using
either the inline or RDMA transfer model, or of course employ either the inline or RDMA transfer model, or of course employ
traditional TCP stream operation. In the RDMA inline case, header traditional TCP stream operation. In the RDMA inline case, header
padding is recommended to optimize behavior at the server. At the padding is recommended to optimize behavior at the server. At the
client, close attention should be paid to the implementation of client, close attention should be paid to the implementation of
RPCSEC_GSS processing to minimize memory referencing and especially RPCSEC_GSS processing to minimize memory referencing and especially
copying. These are well-advised in any case! copying. These are well-advised in any case!
Proper authentication of the session and clientid creation
operation of the proposed NFSv4.1 exactly follows the similar
requirement on client identifiers in NFSv4.0. It must not be
possible for a client to bind a callback channel to an existing
session by guessing its session identifier. To protect against
this, NFSv4.0 requires appropriate authentication and matching of
the principal used. This is discussed in Section 16, Security
Considerations of [RFC3530]. The same requirement before binding
to a session identifier applies here.
The proposed session callback channel binding improves security The proposed session callback channel binding improves security
over that provided by NFSv4 for the callback channel. The over that provided by NFSv4 for the callback channel. The
connection is client-initiated, and subject to the same firewall connection is client-initiated, and subject to the same firewall
and routing checks as the operations channel. The connection and routing checks as the operations channel. The connection
cannot be hijacked by an attacker who connects to the client port cannot be hijacked by an attacker who connects to the client port
prior to the intended server. The connection is set up by the prior to the intended server. The connection is set up by the
client with its desired attributes, such as optionally securing client with its desired attributes, such as optionally securing
with IPsec or similar. The binding is fully authenticated before with IPsec or similar. The binding is fully authenticated before
being activated. being activated.
4.1. Authentication
Proper authentication of the principal which issues any session and
clientid in the proposed NFSv4.1 operations exactly follows the
similar requirement on client identifiers in NFSv4.0. It must not
be possible for a client to impersonate another by guessing its
session identifiers for NFSv4.1 operations, nor to bind a callback
channel to an existing session. To protect against this, NFSv4.0
requires appropriate authentication and matching of the principal
used. This is discussed in Section 16, Security Considerations of
[RFC3530]. The same requirement when using a session identifier
applies to NFSv4.1 here.
Going beyond NFSv4.0, the presence of a session associated with any
clientid may also be used to enhance NFSv4.1 security with respect
to client impersonation. In NFSv4.0, there are many operations
which carry no clientid, including in particular those which employ
a stateid argument. A rogue client which wished to carry out a
denial of service attack on another client could perform CLOSE,
DELEGRETURN, etc operations with that client's current filehandle,
sequenceid and stateid, after having obtained them from
eavesdropping or other approach. Locking and open downgrade
operations could be similarly attacked.
When an NFSv4.1 session is in place for any clientid,
countermeasures are easily applied through use of authentication by
the server. Because the clientid and sessionid must be present in
each request within a session, the server may verify that the
clientid is in fact originating from a principal with the
appropriate authenticated credentials, that the sessionid belongs
to the clientid, and that the stateid is valid in these contexts.
This is in general not possible with the affected operations in
NFSv4.0 due to the fact that the clientid is not present in the
requests.
In the event that authentication information is not available in
the incoming request, for example after a reconnection when the
security was previously downgraded using CCM, the server must
require the client re-establish the authentication in order that
the server may validate the other client-provided context, prior to
executing any operation. The sessionid, present in the newly
retransmitted request, combined with the retransmission detection
enabled by the NFSv4.1 duplicate request cache, are a convenient
and reliable context for the server to use for this contingency.
The server should take care to protect itself against denial of The server should take care to protect itself against denial of
service attacks in the creation of sessions and clientids. Clients service attacks in the creation of sessions and clientids. Clients
who connect and create sessions, only to disconnect and never use who connect and create sessions, only to disconnect and never use
them may leave significant state behind. (The same issue applies them may leave significant state behind. (The same issue applies
to NFSv4.0 with clients who may perform SETCLIENTID, then never to NFSv4.0 with clients who may perform SETCLIENTID, then never
perform SETCLIENTID_CONFIRM.) Careful authentication coupled with perform SETCLIENTID_CONFIRM.) Careful authentication coupled with
resource checks is highly recommended. resource checks is highly recommended.
5. IANA Considerations 5. IANA Considerations
skipping to change at page 45, line 39 skipping to change at page 46, line 39
{ id_arg, verifier_arg, *, clientid_ret, FALSE } { id_arg, verifier_arg, *, clientid_ret, FALSE }
ERRORS ERRORS
NFS4ERR_BADXDR NFS4ERR_BADXDR
NFS4ERR_CLID_INUSE NFS4ERR_CLID_INUSE
NFS4ERR_INVAL NFS4ERR_INVAL
NFS4ERR_RESOURCE NFS4ERR_RESOURCE
NFS4ERR_SERVERFAULT NFS4ERR_SERVERFAULT
6.2. Operation: CREATE_SESSION - Create New Session and Confirm 6.2. Operation: CREATESESSION - Create New Session and Confirm Clientid
Clientid
SYNOPSIS SYNOPSIS
clientid, sessionid, session_args -> session_args clientid, session_args -> sessionid, session_args
ARGUMENT ARGUMENT
struct CREATESESSION4args { struct CREATESESSION4args {
clientid4 clientid; clientid4 clientid;
bool persist; bool persist;
count4 maxrequestsize; count4 maxrequestsize;
count4 maxresponsesize; count4 maxresponsesize;
count4 maxrequests; count4 maxrequests;
count4 headerpadsize; count4 headerpadsize;
switch (bool clientid_confirm) { switch (bool clientid_confirm) {
skipping to change at page 47, line 4 skipping to change at page 48, line 4
case DEFAULT: case DEFAULT:
void; void;
case STREAM: case STREAM:
streamchannelattrs4 streamchanattrs; streamchannelattrs4 streamchanattrs;
case RDMA: case RDMA:
rdmachannelattrs4 rdmachanattrs; rdmachannelattrs4 rdmachanattrs;
}; };
}; };
RESULT RESULT
typedef uint32_t sessionid4; typedef opaque sessionid4[16];
struct CREATESESSION4resok { struct CREATESESSION4resok {
sessionid4 sessionid; sessionid4 sessionid;
bool persist; bool persist;
count4 maxrequestsize; count4 maxrequestsize;
count4 maxresponsesize; count4 maxresponsesize;
count4 maxrequests; count4 maxrequests;
count4 headerpadsize; count4 headerpadsize;
switch (channelmode4 mode) { switch (channelmode4 mode) {
case DEFAULT: case DEFAULT:
void; void;
case STREAM: case STREAM:
streamchannelattrs4 streamchanattrs; streamchannelattrs4 streamchanattrs;
case RDMA: case RDMA:
rdmachannelattrs4 rdmachanattrs; rdmachannelattrs4 rdmachanattrs;
}; };
}; };
union CREATESESSION4res switch (nfsstat4 status) { union CREATESESSION4res switch (nfsstat4 status) {
case NFS4_OK: case NFS4_OK:
CREATE_SESSION4resok resok4; CREATESESSION4resok resok4;
default: default:
void; void;
}; };
DESCRIPTION DESCRIPTION
This operation is used by the client to create new session objects This operation is used by the client to create new session objects
on the server. Additionally the first session created with a new on the server. Additionally the first session created with a new
shorthand client identifier serves to confirm the creation of that shorthand client identifier serves to confirm the creation of that
client's state on the server. The server returns the parameter client's state on the server. The server returns the parameter
skipping to change at page 54, line 37 skipping to change at page 55, line 37
clientid4 clientid; clientid4 clientid;
sessionid4 sessionid; sessionid4 sessionid;
sequenceid4 sequenceid; sequenceid4 sequenceid;
slotid4 slotid; slotid4 slotid;
slotid4 maxslot; slotid4 maxslot;
slotid4 target_maxslot; slotid4 target_maxslot;
}; };
union SEQUENCE4res switch (nfsstat4 status) { union SEQUENCE4res switch (nfsstat4 status) {
case NFS4_OK: case NFS4_OK:
struct SEQUENCE4resok resok4; SEQUENCE4resok resok4;
default: default:
void; void;
}; };
DESCRIPTION DESCRIPTION
The SEQUENCE operation is used to manage operational accounting for The SEQUENCE operation is used to manage operational accounting for
the session on which the operation is sent. The contents include the session on which the operation is sent. The contents include
the client and session to which this request belongs, slotid and the client and session to which this request belongs, slotid and
sequenceid, used by the server to implement session request control sequenceid, used by the server to implement session request control
and the duplicate reply cache semantics, and exchanged slot counts and the duplicate reply cache semantics, and exchanged slot counts
which are used to adjust these values. This operation must appear which are used to adjust these values. This operation must appear
once as the first operation in each COMPOUND and CB_COMPOUND sent once as the first operation in each COMPOUND sent after the channel
after the channel is successfully bound, or a protocol error must is successfully bound, or a protocol error must result.
result.
... ...
ERRORS ERRORS
NFS4ERR_BADSESSION NFS4ERR_BADSESSION
NFS4ERR_BADSLOT NFS4ERR_BADSLOT
6.6. Callback operation: CB_RECALLCREDIT - change flow control limits 6.6. Callback operation: CB_RECALLCREDIT - change flow control limits
skipping to change at page 56, line 5 skipping to change at page 57, line 5
The CB_RECALLCREDIT operation requests the client to return session The CB_RECALLCREDIT operation requests the client to return session
and transport credits to the server, by zero-length RDMA Sends or and transport credits to the server, by zero-length RDMA Sends or
NULL NFSv4 operations. NULL NFSv4 operations.
... ...
ERRORS ERRORS
<none> <none>
6.7. Callback operation: CB_SEQUENCE - Supply callback channel
sequencing and control
SYNOPSIS
control -> control
ARGUMENT
typedef uint32_t sequenceid4;
typedef uint32_t slotid4;
struct CB_SEQUENCE4args {
clientid4 clientid;
sessionid4 sessionid;
sequenceid4 sequenceid;
slotid4 slotid;
slotid4 maxslot;
};
RESULT
struct CB_SEQUENCE4resok {
clientid4 clientid;
sessionid4 sessionid;
sequenceid4 sequenceid;
slotid4 slotid;
slotid4 maxslot;
slotid4 target_maxslot;
};
union CB_SEQUENCE4res switch (nfsstat4 status) {
case NFS4_OK:
CB_SEQUENCE4resok resok4;
default:
void;
};
DESCRIPTION
The CB_SEQUENCE operation is used to manage operational accounting
for the callback channel of the session on which the operation is
sent. The contents include the client and session to which this
request belongs, slotid and sequenceid, used by the server to
implement session request control and the duplicate reply cache
semantics, and exchanged slot counts which are used to adjust these
values. This operation must appear once as the first operation in
each CB_COMPOUND sent after the callback channel is successfully
bound, or a protocol error must result.
...
ERRORS
NFS4ERR_BADSESSION
NFS4ERR_BADSLOT
7. NFSv4 Session Protocol Description 7. NFSv4 Session Protocol Description
This section contains the proposed protocol changes in RPC This section contains the proposed protocol changes in RPC
description language. The constants named in this section are description language. The constants named in this section are
illustrative. When the working group decides on the full content illustrative. When the working group decides on the full content
of the NFSv4.1 minor revision, they may change in order to avoid of the NFSv4.1 minor revision, they may change in order to avoid
conflict. conflict.
NFS4ERR_BADSESSION = 10049,/* invalid session */ NFS4ERR_BADSESSION = 10049,/* invalid session */
NFS4ERR_BADSLOT = 10050 /* invalid slotid */ NFS4ERR_BADSLOT = 10050 /* invalid slotid */
skipping to change at page 57, line 33 skipping to change at page 59, line 33
CREATECLIENTID4resok resok4; CREATECLIENTID4resok resok4;
default: default:
void; void;
}; };
/* /*
* Channel attributes - TBD. * Channel attributes - TBD.
*/ */
enum channelmode4 { enum channelmode4 {
DEFAULT = 0, // don't change DEFAULT = 0, /* don't change */
STREAM = 1, // TCP stream STREAM = 1, /* TCP stream */
RDMA = 2 // upshift to RDMA RDMA = 2 /* upshift to RDMA */
}; };
struct streamchannelattrs4 { struct streamchannelattrs4 {
/* TBD */ opaque nothing[0]; /* TBD */
}; };
struct rdmachannelattrs4 { struct rdmachannelattrs4 {
count4 maxrdmareads; count4 maxrdmareads;
/* plus TBD */ /* plus TBD */
}; };
/* /*
* CREATESESSION: v4.1 session creation and optional * CREATESESSION: v4.1 session creation and optional
* clientid confirm * clientid confirm
*/ */
typedef opaque sessionid4[16]; typedef opaque sessionid4[16];
struct CREATESESSION4args { union optverifier4 switch (bool clientid_confirm) {
clientid4 clientid;
bool persist;
count4 maxrequestsize;
count4 maxresponsesize;
count4 maxrequests;
count4 headerpadsize;
switch (bool clientid_confirm) {
case TRUE: case TRUE:
verifier4 setclientid_confirm; verifier4 setclientid_confirm;
case FALSE: case FALSE:
void; void;
} };
switch (channelmode4 mode) {
union transportattrs4 switch (channelmode4 mode) {
case DEFAULT: case DEFAULT:
void; void;
case STREAM: case STREAM:
streamchannelattrs4 streamchanattrs; streamchannelattrs4 streamchanattrs;
case RDMA: case RDMA:
rdmachannelattrs4 rdmachanattrs; rdmachannelattrs4 rdmachanattrs;
}; };
struct CREATESESSION4args {
clientid4 clientid;
bool persist;
count4 maxrequestsize;
count4 maxresponsesize;
count4 maxrequests;
count4 headerpadsize;
optverifier4 verifier;
transportattrs4 transportattrs;
}; };
struct CREATESESSION4resok { struct CREATESESSION4resok {
sessionid4 sessionid; sessionid4 sessionid;
bool persist; bool persist;
count4 maxrequestsize; count4 maxrequestsize;
count4 maxresponsesize; count4 maxresponsesize;
count4 maxrequests; count4 maxrequests;
count4 headerpadsize; count4 headerpadsize;
switch (channelmode4 mode) { transportattrs4 transportattrs;
case DEFAULT:
void;
case STREAM:
streamchannelattrs4 streamchanattrs;
case RDMA:
rdmachannelattrs4 rdmachanattrs;
};
}; };
union CREATESESSION4res switch (nfsstat4 status) { union CREATESESSION4res switch (nfsstat4 status) {
case NFS4_OK: case NFS4_OK:
CREATE_SESSION4resok resok4; CREATESESSION4resok resok4;
default: default:
void; void;
}; };
/* /*
* BIND_BACKCHANNEL: v4.1 callback binding * BIND_BACKCHANNEL: v4.1 callback binding
*/ */
struct BIND_BACKCHANNEL4args { struct BIND_BACKCHANNEL4args {
clientid4 clientid; clientid4 clientid;
uint32_t callback_program; uint32_t callback_program;
skipping to change at page 59, line 21 skipping to change at page 61, line 19
* BIND_BACKCHANNEL: v4.1 callback binding * BIND_BACKCHANNEL: v4.1 callback binding
*/ */
struct BIND_BACKCHANNEL4args { struct BIND_BACKCHANNEL4args {
clientid4 clientid; clientid4 clientid;
uint32_t callback_program; uint32_t callback_program;
uint32_t callback_ident; uint32_t callback_ident;
count4 maxrequestsize; count4 maxrequestsize;
count4 maxresponsesize; count4 maxresponsesize;
count4 maxrequests; count4 maxrequests;
switch (channelmode4 mode) { transportattrs4 transportattrs;
case DEFAULT:
void;
case STREAM:
streamchannelattrs4 streamchanattrs;
case RDMA:
rdmachannelattrs4 rdmachanattrs;
};
}; };
struct BIND_BACKCHANNEL4resok { struct BIND_BACKCHANNEL4resok {
count4 maxrequestsize; count4 maxrequestsize;
count4 maxresponsesize; count4 maxresponsesize;
count4 maxrequests; count4 maxrequests;
switch (channelmode4 mode) { transportattrs4 transportattrs;
case DEFAULT:
void;
case STREAM:
streamchannelattrs4 streamchanattrs;
case RDMA:
rdmachannelattrs4 rdmachanattrs;
};
}; };
union BIND_BACKCHANNEL4res switch (nfsstat4 status) { union BIND_BACKCHANNEL4res switch (nfsstat4 status) {
case NFS4_OK: case NFS4_OK:
BIND_BACKCHANNEL4resok resok4; BIND_BACKCHANNEL4resok resok4;
default: default:
void; void;
}; };
/* /*
skipping to change at page 62, line 12 skipping to change at page 63, line 41
struct CB_RECALLCREDIT4args { struct CB_RECALLCREDIT4args {
sessionid4 sessionid; sessionid4 sessionid;
uint32_t target; uint32_t target;
}; };
struct CB_RECALLCREDIT4res { struct CB_RECALLCREDIT4res {
nfsstat4 status; nfsstat4 status;
}; };
/*
* CB_SEQUENCE: v4.1 operation sequence control
*/
struct CB_SEQUENCE4args {
clientid4 clientid;
sessionid4 sessionid;
sequenceid4 sequenceid;
slotid4 slotid;
slotid4 maxslot;
};
struct CB_SEQUENCE4resok {
clientid4 clientid;
sessionid4 sessionid;
sequenceid4 sequenceid;
slotid4 slotid;
slotid4 maxslot;
slotid4 target_maxslot;
};
union CB_SEQUENCE4res switch (nfsstat4 status) {
case NFS4_OK:
struct CB_SEQUENCE4resok resok4;
default:
void;
};
/* Operation values */ /* Operation values */
OP_CB_RECALL_CREDIT = 5, OP_CB_RECALL_CREDIT = 5,
OP_CB_SEQUENCE = 6
/* Operation arguments */ /* Operation arguments */
case OP_CB_RECALLCREDIT: case OP_CB_RECALLCREDIT:
CB_RECALLCREDIT4args opcbrecallcredit; CB_RECALLCREDIT4args opcbrecallcredit;
case OP_CB_SEQUENCE:
CB_SEQUENCE4args opcbsequence;
/* Operation results */ /* Operation results */
case OP_CB_RECALLCREDIT: case OP_CB_RECALLCREDIT:
CB_RECALLCREDIT4res opcbrecallcredit; CB_RECALLCREDIT4res opcbrecallcredit;
case OP_CB_SEQUENCE:
CB_SEQUENCE4res opcbsequence;
8. Acknowledgements 8. Acknowledgements
The authors wish to acknowledge the valuable contributions and The authors wish to acknowledge the valuable contributions and
review of Charles Antonelli, Brent Callaghan, Mike Eisler, John review of Charles Antonelli, Brent Callaghan, Mike Eisler, John
Howard, Chet Juszczak, Trond Myklebust, Dave Noveck, John Scott, Howard, Chet Juszczak, Trond Myklebust, Dave Noveck, John Scott,
Mike Stolarchuk and Mark Wittle. Mike Stolarchuk and Mark Wittle.
9. References 9. References
skipping to change at page 63, line 23 skipping to change at page 65, line 44
[DCK+03] [DCK+03]
M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T. M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T.
Talpey, M. Wittle, "The Direct Access File System", in Talpey, M. Wittle, "The Direct Access File System", in
Proceedings of 2nd USENIX Conference on File and Storage Proceedings of 2nd USENIX Conference on File and Storage
Technologies (FAST '03), San Francisco, CA, March 31 - April Technologies (FAST '03), San Francisco, CA, March 31 - April
2, 2003 2, 2003
[DDP] [DDP]
H. Shah, J. Pinkerton, R. Recio, P. Culley, "Direct Data H. Shah, J. Pinkerton, R. Recio, P. Culley, "Direct Data
Placement over Reliable Transports", Placement over Reliable Transports",
http://www.ietf.org/internet-drafts/draft-ietf-rddp-ddp-01 http://www.ietf.org/internet-drafts/draft-ietf-rddp-ddp-03
[FJDAFS] [FJDAFS]
Fujitsu Prime Software Technologies, "Meet the DAFS Fujitsu Prime Software Technologies, "Meet the DAFS
Performance with DAFS/VI Kernel Implementation using cLAN", Performance with DAFS/VI Kernel Implementation using cLAN",
http://www.pst.fujitsu.com/english/dafsdemo/index.html http://www.pst.fujitsu.com/english/dafsdemo/index.html
[FJNFS] [FJNFS]
Fujitsu Prime Software Technologies, "An Adaptation of VIA to Fujitsu Prime Software Technologies, "An Adaptation of VIA to
NFS on Linux", NFS on Linux",
http://www.pst.fujitsu.com/english/nfs/index.html http://www.pst.fujitsu.com/english/nfs/index.html
skipping to change at page 64, line 12 skipping to change at page 66, line 30
Proceedings of 2002 USENIX Annual Technical Conference, Proceedings of 2002 USENIX Annual Technical Conference,
Monterey, CA, June 9-14, 2002. Monterey, CA, June 9-14, 2002.
[MIDTAX] [MIDTAX]
B. Carpenter, S. Brim, "Middleboxes: Taxonomy and Issues", B. Carpenter, S. Brim, "Middleboxes: Taxonomy and Issues",
Informational RFC, http://www.ietf.org/rfc/rfc3234 Informational RFC, http://www.ietf.org/rfc/rfc3234
[NFSDDP] [NFSDDP]
B. Callaghan, T. Talpey, "NFS Direct Data Placement", B. Callaghan, T. Talpey, "NFS Direct Data Placement",
Internet-Draft Work in Progress, http://www.ietf.org/internet- Internet-Draft Work in Progress, http://www.ietf.org/internet-
drafts/draft-ietf-nfsv4-nfsdirect-00 drafts/draft-ietf-nfsv4-nfsdirect-01
[NFSPS] [NFSPS]
T. Talpey, C. Juszczak, "NFS RDMA Problem Statement", T. Talpey, C. Juszczak, "NFS RDMA Problem Statement",
Internet-Draft Work in Progress, http://www.ietf.org/internet- Internet-Draft Work in Progress, http://www.ietf.org/internet-
drafts/draft-ietf-nfsv4-nfs-rdma-problem-statement-00 drafts/draft-ietf-nfsv4-nfs-rdma-problem-statement-02
[RDMAREQ]
B. Callaghan, M. Wittle, "NFS RDMA requirements", Internet-
Draft Work in Progress, http://www.ietf.org/internet-
drafts/draft-callaghan-nfs-rdmareq-00
[RDDP] [RDDP]
Remote Direct Data Placement Working Group charter, Remote Direct Data Placement Working Group charter,
http://www.ietf.org/html.charters/rddp-charter.html http://www.ietf.org/html.charters/rddp-charter.html
[RDDPPS] [RDDPPS]
A. Romanow, J. Mogul, T. Talpey, S. Bailey, Remote Direct Data A. Romanow, J. Mogul, T. Talpey, S. Bailey, Remote Direct Data
Placement Working Group Problem Statement, Internet-Draft Work Placement Working Group Problem Statement, Internet-Draft Work
in Progress, http://www.ietf.org/internet-drafts/draft-ietf- in Progress, http://www.ietf.org/internet-drafts/draft-ietf-
rddp-problem-statement-04 rddp-problem-statement-05
[RDMAP] [RDMAP]
R. Recio, P. Culley, D. Garcia, J. Hilland, "An RDMA Protocol R. Recio, P. Culley, D. Garcia, J. Hilland, "An RDMA Protocol
Specification", Internet-Draft Work in Progress, Specification", Internet-Draft Work in Progress,
http://www.ietf.org/internet-drafts/draft-ietf-rddp-rdmap-01 http://www.ietf.org/internet-drafts/draft-ietf-rddp-rdmap-03
[RPCRDMA] [RPCRDMA]
B. Callaghan, T. Talpey, "RDMA Transport for ONC RPC" B. Callaghan, T. Talpey, "RDMA Transport for ONC RPC"
Internet-Draft Work in Progress, http://www.ietf.org/internet- Internet-Draft Work in Progress, http://www.ietf.org/internet-
drafts/draft-ietf-nfsv4-rpcrdma-00 drafts/draft-ietf-nfsv4-rpcrdma-01
[RFC2203] [RFC2203]
M. Eisler, A. Chiu, L. Ling, "RPCSEC_GSS Protocol M. Eisler, A. Chiu, L. Ling, "RPCSEC_GSS Protocol
Specification", Standards Track RFC, Specification", Standards Track RFC,
http://www.ietf.org/rfc/rfc2203 http://www.ietf.org/rfc/rfc2203
[RW96] [RW96]
R. Werme, "RPC XID Issues", Connectathon 1996, San Jose, CA, R. Werme, "RPC XID Issues", Connectathon 1996, San Jose, CA,
http://www.cthon.org/talks96/werme1.pdf http://www.cthon.org/talks96/werme1.pdf
skipping to change at page 65, line 37 skipping to change at page 68, line 7
University of Michigan University of Michigan
Center for Information Technology Integration Center for Information Technology Integration
535 W. William St. Suite 3100 535 W. William St. Suite 3100
Ann Arbor, MI 48103 USA Ann Arbor, MI 48103 USA
Phone: +1 734 615-4782 Phone: +1 734 615-4782
Email: baumanj@umich.edu Email: baumanj@umich.edu
11. Full Copyright Statement 11. Full Copyright Statement
Copyright (C) The Internet Society (2004). This document is Copyright (C) The Internet Society (2005). This document is
subject to the rights, licenses and restrictions contained in BCP subject to the rights, licenses and restrictions contained in BCP
78 and except as set forth therein, the authors retain all their 78 and except as set forth therein, the authors retain all their
rights. rights.
This document and the information contained herein are provided on This document and the information contained herein are provided on
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/