draft-ietf-nfsv4-nfsdirect-08.txt   rfc5667.txt 
NFSv4 Working Group Tom Talpey Internet Engineering Task Force (IETF) T. Talpey
Internet-Draft NetApp Request for Comments: 5667 Unaffiliated
Intended status: Standards Track Brent Callaghan Category: Standards Track B. Callaghan
Expires: October 17, 2008 Apple ISSN: 2070-1721 Apple
April 16, 2008 January 2010
NFS Direct Data Placement Network File System (NFS) Direct Data Placement
draft-ietf-nfsv4-nfsdirect-08
Status of this Memo Abstract
By submitting this Internet-Draft, each author represents that any This document defines the bindings of the various Network File System
applicable patent or other IPR claims of which he or she is aware (NFS) versions to the Remote Direct Memory Access (RDMA) operations
have been or will be disclosed, and any of which he or she becomes supported by the RPC/RDMA transport protocol. It describes the use
aware will be disclosed, in accordance with Section 6 of BCP 79. of direct data placement by means of server-initiated RDMA operations
into client-supplied buffers for implementations of NFS versions 2,
3, 4, and 4.1 over such an RDMA transport.
Internet-Drafts are working documents of the Internet Engineering Status of This Memo
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six This is an Internet Standards Track document.
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at This document is a product of the Internet Engineering Task Force
http://www.ietf.org/ietf/1id-abstracts.txt (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 5741.
The list of Internet-Draft Shadow Directories can be accessed at Information about the current status of this document, any errata,
http://www.ietf.org/shadow.html. and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc5667.
Abstract Copyright Notice
This draft defines the bindings of the various Network File System Copyright (c) 2010 IETF Trust and the persons identified as the
(NFS) versions to the Remote Direct Memory Access (RDMA) operations document authors. All rights reserved.
supported by the RPC/RDMA transport protocol. It describes the use
of direct data placement by means of server-initiated RDMA operations
into client-supplied buffers for implementations of NFS versions 2,
3, 4 and 4.1 over such an RDMA transport.
Table of Contents This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 This document may contain material from IETF Documents or IETF
2. Transfers from NFS Client to NFS Server . . . . . . . . . . 2 Contributions published or made publicly available before November
3. Transfers from NFS Server to NFS Client . . . . . . . . . . 3 10, 2008. The person(s) controlling the copyright in some of this
4. NFS Versions 2 and 3 Mapping . . . . . . . . . . . . . . . . 4 material may not have granted the IETF Trust the right to allow
5. NFS Version 4 Mapping . . . . . . . . . . . . . . . . . . . 5 modifications of such material outside the IETF Standards Process.
5.1. NFS Version 4 Callbacks . . . . . . . . . . . . . . . . . 7 Without obtaining an adequate license from the person(s) controlling
6. Port Usage Considerations . . . . . . . . . . . . . . . . . 8 the copyright in such materials, this document may not be modified
7. Security Considerations . . . . . . . . . . . . . . . . . . 8 outside the IETF Standards Process, and derivative works of it may
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . 8 not be created outside the IETF Standards Process, except to format
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 9 it for publication as an RFC or to translate it into languages other
10. Normative References . . . . . . . . . . . . . . . . . . . 9 than English.
11. Informative References . . . . . . . . . . . . . . . . . 10
12. Authors' Addresses . . . . . . . . . . . . . . . . . . . 10
13. Intellectual Property and Copyright Statements . . . . . 11
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . 11
Requirements Language Table of Contents
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 1. Introduction ....................................................2
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 1.1. Requirements Language ......................................2
document are to be interpreted as described in [RFC2119]. 2. Transfers from NFS Client to NFS Server .........................3
3. Transfers from NFS Server to NFS Client .........................3
4. NFS Versions 2 and 3 Mapping ....................................4
5. NFS Version 4 Mapping ...........................................6
5.1. NFS Version 4 Callbacks ....................................7
6. Port Usage Considerations .......................................8
7. Security Considerations .........................................9
8. Acknowledgments .................................................9
9. References ......................................................9
9.1. Normative References .......................................9
9.2. Informative References ....................................10
1. Introduction 1. Introduction
The Remote Direct Memory Access (RDMA) Transport for Remote Procedure The Remote Direct Memory Access (RDMA) Transport for Remote Procedure
Calls (RPC) [RPCRDMA] allows an RPC client application to post Call (RPC) [RFC5666] allows an RPC client application to post buffers
buffers in a Chunk list for specific arguments and results from an in a Chunk list for specific arguments and results from an RPC call.
RPC call. The RDMA transport header conveys this list of client The RDMA transport header conveys this list of client buffer
buffer addresses to the server where the application can associate addresses to the server where the application can associate them with
them with client data and use RDMA operations to transfer the results client data and use RDMA operations to transfer the results directly
directly to and from the posted buffers on the client. The client to and from the posted buffers on the client. The client and server
and server must agree on a consistent mapping of posted buffers to must agree on a consistent mapping of posted buffers to RPC. This
RPC. This document details the mapping for each version of the NFS document details the mapping for each version of the NFS protocol
protocol [RFC1094] [RFC1813] [RFC3530] [NFSv4.1]. [RFC1094] [RFC1813] [RFC3530] [RFC5661].
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Transfers from NFS Client to NFS Server 2. Transfers from NFS Client to NFS Server
The RDMA Read list, in the RDMA transport header, allows an RPC The RDMA Read list, in the RDMA transport header, allows an RPC
client to marshal RPC call data selectively. Large chunks of data, client to marshal RPC call data selectively. Large chunks of data,
such as the file data of an NFS WRITE request, MAY be referenced by such as the file data of an NFS WRITE request, MAY be referenced by
an RDMA Read list and be moved efficiently and directly-placed by an an RDMA Read list and be moved efficiently and directly placed by an
RDMA Read operation initiated by the server. RDMA Read operation initiated by the server.
The process of identifying these chunks for the RDMA Read list can be The process of identifying these chunks for the RDMA Read list can be
implemented entirely within the RPC layer. It is transparent to the implemented entirely within the RPC layer. It is transparent to the
upper-level protocol, such as NFS. For instance, the file data upper-level protocol, such as NFS. For instance, the file data
portion of an NFS WRITE request can be selected as an RDMA "chunk" portion of an NFS WRITE request can be selected as an RDMA "chunk"
within the eXternal Data Representation (XDR) marshaling code of RPC within the eXternal Data Representation (XDR) marshaling code of RPC
based on a size criterion, independently of the NFS protocol layer. based on a size criterion, independently of the NFS protocol layer.
The XDR unmarshaling on the receiving system can identify the The XDR unmarshaling on the receiving system can identify the
correspondence between Read chunks and protocol elements via the XDR correspondence between Read chunks and protocol elements via the XDR
position value encoded in the Read chunk entry. position value encoded in the Read chunk entry.
RPC RDMA Read chunks are employed by this NFS mapping to convey RPC RDMA Read chunks are employed by this NFS mapping to convey
specific NFS data to the server in a manner which may be directly specific NFS data to the server in a manner that may be directly
placed. The following sections describe this mapping for versions of placed. The following sections describe this mapping for versions of
the NFS protocol. the NFS protocol.
3. Transfers from NFS Server to NFS Client 3. Transfers from NFS Server to NFS Client
The RDMA Write list, in the RDMA transport header, allows the client The RDMA Write list, in the RDMA transport header, allows the client
to post one or more buffers into which the server will RDMA Write to post one or more buffers into which the server will RDMA Write
designated result chunks directly. If the client sends a null Write designated result chunks directly. If the client sends a null Write
list, then results from the RPC call will be returned as either an list, then results from the RPC call will be returned either as an
inline reply, as chunks in an RDMA Read list of server-posted inline reply, as chunks in an RDMA Read list of server-posted
buffers, or in a client-posted reply buffer. buffers, or in a client-posted reply buffer.
Each posted buffer in a Write list is represented as an array of Each posted buffer in a Write list is represented as an array of
memory segments. This allows the client some flexibility in memory segments. This allows the client some flexibility in
submitting discontiguous memory segments into which the server will submitting discontiguous memory segments into which the server will
scatter the result. Each segment is described by a triplet scatter the result. Each segment is described by a triplet
consisting of the segment handle or steering tag (STag), segment consisting of the segment handle or steering tag (STag), segment
length, and memory address or offset. length, and memory address or offset.
skipping to change at page 4, line 10 skipping to change at page 4, line 18
The sum of the segment lengths yields the total size of the buffer, The sum of the segment lengths yields the total size of the buffer,
which MUST be large enough to accept the result. If the buffer is which MUST be large enough to accept the result. If the buffer is
too small, the server MUST return an XDR encode error. The server too small, the server MUST return an XDR encode error. The server
MUST return the result data for a posted buffer by progressively MUST return the result data for a posted buffer by progressively
filling its segments, perhaps leaving some trailing segments unfilled filling its segments, perhaps leaving some trailing segments unfilled
or partially full if the size of the result is less than the total or partially full if the size of the result is less than the total
size of the buffer segments. size of the buffer segments.
The server returns the RDMA Write list to the client with the segment The server returns the RDMA Write list to the client with the segment
length fields overwritten to indicate the amount of data RDMA Written length fields overwritten to indicate the amount of data RDMA written
to each segment. Results returned by direct placement MUST NOT be to each segment. Results returned by direct placement MUST NOT be
returned by other methods, e.g., by Read chunk list or inline. If no returned by other methods, e.g., by Read chunk list or inline. If no
result data at all is returned for the element, the server places no result data at all is returned for the element, the server places no
data in the buffer(s), but does return zeroes in the segment length data in the buffer(s), but does return zeros in the segment length
fields corresponding to the result. fields corresponding to the result.
The RDMA Write list allows the client to provide multiple result The RDMA Write list allows the client to provide multiple result
buffers - each buffer maps to a specific result in the reply. The buffers -- each buffer maps to a specific result in the reply. The
NFS client and server implementations agree by specifying the mapping NFS client and server implementations agree by specifying the mapping
of results to buffers for each RPC procedure. The following sections of results to buffers for each RPC procedure. The following sections
describe this mapping for versions of the NFS protocol. describe this mapping for versions of the NFS protocol.
Through the use of RDMA Write lists in NFS requests, it is not Through the use of RDMA Write lists in NFS requests, it is not
necessary to employ the RDMA Read lists in the NFS replies, as necessary to employ the RDMA Read lists in the NFS replies, as
described in the RPC/RDMA protocol. This enables more efficient described in the RPC/RDMA protocol. This enables more efficient
operation, by avoiding the need for the server to expose buffers for operation, by avoiding the need for the server to expose buffers for
RDMA, and also avoiding "RDMA_DONE" exchanges. Clients MAY RDMA, and also avoiding "RDMA_DONE" exchanges. Clients MAY
additionally employ RDMA Reply chunks to receive entire messages, as additionally employ RDMA Reply chunks to receive entire messages, as
described in [RPCRDMA]. described in [RFC5666].
4. NFS Versions 2 and 3 Mapping 4. NFS Versions 2 and 3 Mapping
A single RDMA Write list entry MAY be posted by the client to receive A single RDMA Write list entry MAY be posted by the client to receive
either the opaque file data from a READ request or the pathname from either the opaque file data from a READ request or the pathname from
a READLINK request. The server MUST ignore a Write list for any a READLINK request. The server MUST ignore a Write list for any
other NFS procedure, as well as any Write list entries beyond the other NFS procedure, as well as any Write list entries beyond the
first in the list. first in the list.
Similarly, a single RDMA Read list entry MAY be posted by the client Similarly, a single RDMA Read list entry MAY be posted by the client
skipping to change at page 5, line 6 skipping to change at page 5, line 13
the first in the list. the first in the list.
Because there are no NFS version 2 or 3 requests that transfer bulk Because there are no NFS version 2 or 3 requests that transfer bulk
data in both directions, it is not necessary to post requests data in both directions, it is not necessary to post requests
containing both Write and Read lists. Any unneeded Read or Write containing both Write and Read lists. Any unneeded Read or Write
lists are ignored by the server. lists are ignored by the server.
In the case where the outgoing request or expected incoming reply is In the case where the outgoing request or expected incoming reply is
larger than the maximum size supported on the connection, it is larger than the maximum size supported on the connection, it is
possible for the RPC layer to post the entire message or result in a possible for the RPC layer to post the entire message or result in a
special "RDMA_NOMSG" message type which is transferred entirely by special "RDMA_NOMSG" message type that is transferred entirely by
RDMA. This is implemented in RPC, below NFS and therefore has no RDMA. This is implemented in RPC, below NFS, and therefore has no
effect on the message contents. effect on the message contents.
Non-RDMA (inline) WRITE transfers MAY OPTIONALLY employ the Non-RDMA (inline) WRITE transfers MAY OPTIONALLY employ the
"RDMA_MSGP" padding method described in the RPC/RDMA protocol, if the "RDMA_MSGP" padding method described in the RPC/RDMA protocol, if the
appropriate value for the server is known to the client. Padding appropriate value for the server is known to the client. Padding
allows the opaque file data to arrive at the server in an aligned allows the opaque file data to arrive at the server in an aligned
fashion, which may improve server performance. fashion, which may improve server performance.
The NFS version 2 and 3 protocols are frequently limited in practice The NFS version 2 and 3 protocols are frequently limited in practice
to requests containing less than or equal to 8 kilobytes and 32 to requests containing less than or equal to 8 kilobytes and 32
kilobytes of data, respectively. In these cases, it is often kilobytes of data, respectively. In these cases, it is often
practical to support basic operation without employing a practical to support basic operation without employing a
configuration exchange as discussed in [RPCRDMA]. The server MUST configuration exchange as discussed in [RFC5666]. The server MUST
post buffers large enough to receive the largest possible incoming post buffers large enough to receive the largest possible incoming
message (approximately 12KB for NFS version 2, or 36KB for NFS message (approximately 12 KB for NFS version 2, or 36 KB for NFS
version 3, would be vastly sufficient), and the client can post version 3, would be vastly sufficient), and the client can post
buffers large enough to receive replies based on the "rsize" it is buffers large enough to receive replies based on the "rsize" it is
using to the server, plus a fixed overhead for the RPC and NFS using to the server, plus a fixed overhead for the RPC and NFS
headers. Because the server MUST NOT return data in excess of this headers. Because the server MUST NOT return data in excess of this
size, the client can be assured of the adequacy of its posted buffer size, the client can be assured of the adequacy of its posted buffer
sizes. sizes.
Flow control is handled dynamically by the RPC RDMA protocol, and Flow control is handled dynamically by the RPC RDMA protocol, and
write padding is OPTIONAL and therefore MAY remain unused. write padding is OPTIONAL and therefore MAY remain unused.
Alternatively, if the server is administratively configured to values Alternatively, if the server is administratively configured to values
appropriate for all its clients, the same assurance of appropriate for all its clients, the same assurance of
interoperability within the domain can be made. interoperability within the domain can be made.
The use of a configuration protocol with NFS v2 and v3 is therefore The use of a configuration protocol with NFS v2 and v3 is therefore
OPTIONAL. Employing a configuration exchange may allow some OPTIONAL. Employing a configuration exchange may allow some
advantage to server resource management through accurately sizing advantage to server resource management through accurately sizing
buffers, enabling the server to know exactly how many RDMA Reads may buffers, enabling the server to know exactly how many RDMA Reads may
be in progress at once on the client connection, and enabling client be in progress at once on the client connection, and enabling client
write padding which may be desirable for certain servers when RDMA write padding, which may be desirable for certain servers when RDMA
Read is impractical. Read is impractical.
5. NFS Version 4 Mapping 5. NFS Version 4 Mapping
This specification applies to the first minor version of NFS version This specification applies to the first minor version of NFS version
4 (NFSv4.0) and any subsequent minor versions that do not override 4 (NFSv4.0) and any subsequent minor versions that do not override
this mapping. this mapping.
The Write list MUST be considered only for the COMPOUND procedure. The Write list MUST be considered only for the COMPOUND procedure.
This procedure returns results from a sequence of operations. Only This procedure returns results from a sequence of operations. Only
the opaque file data from an NFS READ operation, and the pathname the opaque file data from an NFS READ operation and the pathname from
from a READLINK operation MUST utilize entries from the Write list. a READLINK operation MUST utilize entries from the Write list.
If there is no Write list, i.e., the list is null, then any READ or If there is no Write list, i.e., the list is null, then any READ or
READLINK operations in the COMPOUND MUST return their data inline. READLINK operations in the COMPOUND MUST return their data inline.
The NFSv4.0 client MUST ensure in this case that any result of its The NFSv4.0 client MUST ensure in this case that any result of its
READ and READLINK requests will fit within its receive buffers, in READ and READLINK requests will fit within its receive buffers, in
order to avoid a resulting RDMA transport error upon transfer. The order to avoid a resulting RDMA transport error upon transfer. The
server is not required to detect this. server is not required to detect this.
The first entry in the Write list MUST be used by the first READ or The first entry in the Write list MUST be used by the first READ or
READLINK in the COMPOUND request. The next Write list entry by the READLINK in the COMPOUND request. The next Write list entry is used
by the next READ or READLINK, and so on. If there are more READ or by the next READ or READLINK, and so on. If there are more READ or
READLINK operations than Write list entries, then any remaining READLINK operations than Write list entries, then any remaining
operations MUST return their results inline. operations MUST return their results inline.
If a Write list entry is presented, then the corresponding READ or If a Write list entry is presented, then the corresponding READ or
READLINK MUST return its data via an RDMA Write to the buffer READLINK MUST return its data via an RDMA Write to the buffer
indicated by the Write list entry. If the Write list entry has zero indicated by the Write list entry. If the Write list entry has zero
RDMA segments, or if the total size of the segments is zero, then the RDMA segments, or if the total size of the segments is zero, then the
corresponding READ or READLINK operation MUST return its result corresponding READ or READLINK operation MUST return its result
inline. inline.
skipping to change at page 6, line 47 skipping to change at page 7, line 6
A --> B --> C A --> B --> C
Compound request: Compound request:
PUTFH LOOKUP READ PUTFH LOOKUP READLINK PUTFH LOOKUP READ PUTFH LOOKUP READ PUTFH LOOKUP READLINK PUTFH LOOKUP READ
| | | | | |
v v v v v v
A B C A B C
If the client does not want to have the READLINK result returned If the client does not want to have the READLINK result returned
directly, then it provides a zero length array of segment triplets directly, then it provides a zero-length array of segment triplets
for buffer B or sets the values in the segment triplet for buffer B for buffer B or sets the values in the segment triplet for buffer B
to zeros so that the READLINK result MUST be returned inline. to zeros so that the READLINK result MUST be returned inline.
The situation is similar for RDMA Read lists sent by the client and The situation is similar for RDMA Read lists sent by the client and
applies to the NFSv4.0 WRITE and SYMLINK procedures as for v3. applies to the NFSv4.0 WRITE and SYMLINK procedures as for v3.
Additionally, inline segments too large to fit in posted buffers MAY Additionally, inline segments too large to fit in posted buffers MAY
be transferred in special "RDMA_NOMSG" messages. be transferred in special "RDMA_NOMSG" messages.
Non-RDMA (inline) WRITE transfers MAY OPTIONALLY employ the Non-RDMA (inline) WRITE transfers MAY OPTIONALLY employ the
"RDMA_MSGP" padding method described in the RPC/RDMA protocol, if the "RDMA_MSGP" padding method described in the RPC/RDMA protocol, if the
appropriate value for the server is known to the client. Padding appropriate value for the server is known to the client. Padding
allows the opaque file data to arrive at the server in an aligned allows the opaque file data to arrive at the server in an aligned
fashion, which may improve server performance. In order to ensure fashion, which may improve server performance. In order to ensure
accurate alignment for all data, it is likely that the client will accurate alignment for all data, it is likely that the client will
restrict its use of OPTIONAL padding to COMPOUND requests containing restrict its use of OPTIONAL padding to COMPOUND requests containing
only a single WRITE operation. only a single WRITE operation.
Unlike NFS versions 2 and 3, the maximum size of an NFS version 4 Unlike NFS versions 2 and 3, the maximum size of an NFS version 4
COMPOUND is not bounded, even when RDMA chunks are in use. While it COMPOUND is not bounded, even when RDMA chunks are in use. While it
might appear that a configuration protocol exchange (such as the one might appear that a configuration protocol exchange (such as the one
described in [RPCRDMA]) would help, in fact the layering issues described in [RFC5666]) would help, in fact the layering issues
involved in building COMPOUNDs by NFS make such a mechanism involved in building COMPOUNDs by NFS make such a mechanism
unworkable. unworkable.
However, typical NFS version 4 clients rarely issue such problematic However, typical NFS version 4 clients rarely issue such problematic
requests. In practice, they behave in much more predictable ways, in requests. In practice, they behave in much more predictable ways, in
fact most still support the traditional rsize/wsize mount parameters. fact most still support the traditional rsize/wsize mount parameters.
Therefore, most NFS version 4 clients function over RPC/RDMA in the Therefore, most NFS version 4 clients function over RPC/RDMA in the
same way as NFS versions 2 and 3, operationally. same way as NFS versions 2 and 3, operationally.
There are however advantages to allowing both client and server to There are however advantages to allowing both client and server to
operate with prearranged size constraints, for example use of the operate with prearranged size constraints, for example, use of the
sizes to better manage the server's response cache. An extension to sizes to better manage the server's response cache. An extension to
NFS version 4 supporting a more comprehensive exchange of upper layer NFS version 4 supporting a more comprehensive exchange of upper-layer
parameters is part of [NFSv4.1]. parameters is part of [RFC5661].
5.1. NFS Version 4 Callbacks 5.1. NFS Version 4 Callbacks
The NFS version 4 protocols support server-initiated callbacks to The NFS version 4 protocols support server-initiated callbacks to
selected clients, in order to notify them of events such as recalled selected clients, in order to notify them of events such as recalled
delegations, etc. These callbacks present no particular issue to delegations, etc. These callbacks present no particular issue to
being framed over RPC/RDMA, since such callbacks do not carry bulk being framed over RPC/RDMA, since such callbacks do not carry bulk
data such as NFS READ or NFS WRITE. They MAY be transmitted inline data such as NFS READ or NFS WRITE. They MAY be transmitted inline
via RDMA_MSG, or if the callback message or its reply overflow the via RDMA_MSG, or if the callback message or its reply overflow the
negotiated buffer sizes for a callback connection, they MAY be negotiated buffer sizes for a callback connection, they MAY be
transferred via the RDMA_NOMSG method as described above for other transferred via the RDMA_NOMSG method as described above for other
exchanges. exchanges.
One special case is noteworthy: in NFS version 4.1, the callback One special case is noteworthy: in NFS version 4.1, the callback
channel is optionally negotiated to be on the same connection as one channel is optionally negotiated to be on the same connection as one
used for client requests. In this case, and because the XID is used for client requests. In this case, and because the transaction
present in the RPC/RDMA header, the client MUST ascertain whether the ID (XID) is present in the RPC/RDMA header, the client MUST ascertain
message is in fact an RPC REPLY, and therefore a reply to a prior whether the message is in fact an RPC REPLY, and therefore a reply to
request and carrying its XID, before processing it as such. By the a prior request and carrying its XID, before processing it as such.
same token, the server MUST ascertain whether an incoming message on By the same token, the server MUST ascertain whether an incoming
such a callback-eligible connection is an RPC CALL, before optionally message on such a callback-eligible connection is an RPC CALL, before
processing the XID. optionally processing the XID.
In the callback case, the XID present in the RPC/RDMA header will In the callback case, the XID present in the RPC/RDMA header will
potentially have any value which may (or may not) collide with an XID potentially have any value, which may (or may not) collide with an
used by the client for a previous or future request. The client and XID used by the client for a previous or future request. The client
server MUST inspect the RPC component of the message to determine its and server MUST inspect the RPC component of the message to determine
potential disposition as either an RPC CALL or RPC REPLY, prior to its potential disposition as either an RPC CALL or RPC REPLY, prior
processing this XID, and MUST NOT reject or accept it without also to processing this XID, and MUST NOT reject or accept it without also
determining the proper context. determining the proper context.
6. Port Usage Considerations 6. Port Usage Considerations
NFS use of direct data placement introduces a need for an additional NFS use of direct data placement introduces a need for an additional
NFS port number assignment for networks which share traditional UDP NFS port number assignment for networks that share traditional UDP
and TCP port spaces with RDMA services. The iWARP [RFC5041] and TCP port spaces with RDMA services. The iWARP [RFC5041]
[RFC5040] protocol is such an example (Infiniband is not). [RFC5040] protocol is such an example (InfiniBand is not).
NFS servers for versions 2 and 3 [RFC1094] [RFC1813] traditionally NFS servers for versions 2 and 3 [RFC1094] [RFC1813] traditionally
listen for clients on UDP and TCP port 2049, and additionally, they listen for clients on UDP and TCP port 2049, and additionally, they
register these with the portmapper and/or rpcbind [RFC1833] service. register these with the portmapper and/or rpcbind [RFC1833] service.
However, [RFC3530] requires NFS servers for version 4 to listen on However, [RFC3530] requires NFS servers for version 4 to listen on
TCP port 2049, and they are not required to register. TCP port 2049, and they are not required to register.
An NFS version 2 or version 3 server supporting RPC/RDMA on such a An NFS version 2 or version 3 server supporting RPC/RDMA on such a
network and registering itself with the RPC portmapper MAY choose an network and registering itself with the RPC portmapper MAY choose an
arbitrary port, or MAY use the alternative well-known port number for arbitrary port, or MAY use the alternative well-known port number for
its RPC/RDMA service. The chosen port MAY be registered with the RPC its RPC/RDMA service. The chosen port MAY be registered with the RPC
portmapper under the netid assigned by the requirement in [RPCRDMA]. portmapper under the netid assigned by the requirement in [RFC5666].
An NFS version 4 server supporting RPC/RDMA on such a network MUST An NFS version 4 server supporting RPC/RDMA on such a network MUST
use the alternative well-known port number for its RPC/RDMA service. use the alternative well-known port number for its RPC/RDMA service.
Clients SHOULD connect to this well-known port without consulting the Clients SHOULD connect to this well-known port without consulting the
RPC portmapper (as for NFSv4/TCP). RPC portmapper (as for NFSv4/TCP).
The port number assigned to an NFS service over an RPC/RDMA transport The port number assigned to an NFS service over an RPC/RDMA transport
is available from the IANA port registry [RFC3232]. is available from the IANA port registry [RFC3232].
7. Security Considerations 7. Security Considerations
The RDMA transport for RPC [RPCRDMA] supports all RPC [RFC1831bis] The RDMA transport for RPC [RFC5666] supports all RPC [RFC5531]
security models, including RPCSEC_GSS [RFC2203] security and link- security models, including RPCSEC_GSS [RFC2203] security and link-
level security. The choice of RDMA Read and RDMA Write to return RPC level security. The choice of RDMA Read and RDMA Write to return RPC
argument and results, respectively, does not affect this, since it argument and results, respectively, does not affect this, since it
only changes the method of data transfer. Specifically, the only changes the method of data transfer. Specifically, the
requirements of [RPCRDMA] ensure that this choice does not introduce requirements of [RFC5666] ensure that this choice does not introduce
new vulnerabilities. new vulnerabilities.
Because this document defines only the binding of the NFS protocols Because this document defines only the binding of the NFS protocols
atop [RPCRDMA], all relevant security considerations are therefore to atop [RFC5666], all relevant security considerations are therefore to
be described at that layer. be described at that layer.
8. IANA Considerations 8. Acknowledgments
This document has no IANA considerations.
9. Acknowledgments
The authors would like to thank Dave Noveck and Chet Juszczak for The authors would like to thank Dave Noveck and Chet Juszczak for
their contributions to this document. their contributions to this document.
10. Normative References 9. References
[RFC2119]
S. Bradner, "Key words for use in RFCs to Indicate Requirement
Levels",
Best Current Practice,
BCP 14, RFC 2119, March 1997.
[RFC1094]
"NFS: Network File System Protocol Specification",
(NFS version 2) Informational RFC,
http://www.ietf.org/rfc/rfc1094.txt
[RFC1831bis]
R. Thurlow, Ed., "RPC: Remote Procedure Call Protocol
Specification Version 2",
Standards Track RFC
[RFC1813]
B. Callaghan, B. Pawlowski, P. Staubach, "NFS Version 3 Protocol
Specification",
Informational RFC,
http://www.ietf.org/rfc/rfc1813.txt
[RFC1833]
R. Srinivasan, "Binding Protocols for ONC RPC Version 2",
Standards Track RFC,
http://www.ietf.org/rfc/rfc1833.txt
[RFC3530]
S. Shepler, et al., "NFS version 4 Protocol",
Standards Track RFC,
http://www.ietf.org/rfc/rfc3530.txt
[NFSv4.1]
S. Shepler et al., ed., "NFSv4 Minor Version 1"
Internet Draft Work in Progress,
draft-ietf-nfsv4-minorversion1
[RFC2203] 9.1. Normative References
M. Eisler, A. Chiu, L. Ling, "RPCSEC_GSS Protocol Specification",
Standards Track RFC,
http://www.ietf.org/rfc/rfc2203.txt
11. Informative References [RFC1094] Sun Microsystems, "NFS: Network File System Protocol
specification", RFC 1094, March 1989.
[RFC3232] [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS
Internet Assigned Numbers Authority (IANA), Version 3 Protocol Specification", RFC 1813, June 1995.
Port Registry database,
http://www.ietf.org/rfc/rfc3232.txt
http://www.iana.org/assignments/port-numbers
[RPCRDMA] [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2",
T. Talpey, B. Callaghan, "Remote Direct Memory Access Transport RFC 1833, August 1995.
for Remote Procedure Call"
Internet Draft Work in Progress,
draft-ietf-nfsv4-rpcrdma
[RFC5041] [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
H. Shah et al., "Direct Data Placement over Reliable Transports", Requirement Levels", BCP 14, RFC 2119, March 1997.
Standards Track RFC
[RFC5040] [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol
R. Recio et al., "A Remote Direct Memory Access Protocol Specification", RFC 2203, September 1997.
Specification",
Standards Track RFC
12. Authors' Addresses [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R.,
Beame, C., Eisler, M., and D. Noveck, "Network File System
(NFS) version 4 Protocol", RFC 3530, April 2003.
Tom Talpey [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol
Network Appliance, Inc. Specification Version 2", RFC 5531, May 2009.
1601 Trapelo Road, #16
Waltham, MA 02451 USA
Phone: +1 781 768 5329 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
EMail: thomas.talpey@netapp.com "Network File System (NFS) Version 4 Minor Version 1
Brent Callaghan Protocol", RFC 5661, January 2010.
Apple Computer, Inc.
MS: 302-4K
2 Infinite Loop
Cupertino, CA 95014 USA
EMail: brentc@apple.com 9.2. Informative References
13. Intellectual Property and Copyright Statements [RFC3232] Reynolds, J., Ed., "Assigned Numbers: RFC 1700 is Replaced
by an On-line Database", RFC 3232, January 2002.
Full Copyright Statement [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D.
Garcia, "A Remote Direct Memory Access Protocol
Specification", RFC 5040, October 2007.
Copyright (C) The IETF Trust (2008). [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct
Data Placement over Reliable Transports", RFC 5041,
October 2007.
This document is subject to the rights, licenses and restrictions [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access
contained in BCP 78, and except as set forth therein, the authors Transport for Remote Procedure Call", RFC 5666, January
retain all their rights. 2010.
This document and the information contained herein are provided on Authors' Addresses
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Intellectual Property Tom Talpey
The IETF takes no position regarding the validity or scope of any 170 Whitman St.
Intellectual Property Rights or other rights that might be claimed Stow, MA 01775 USA
to pertain to the implementation or use of the technology described
in this document or the extent to which any license under such
rights might or might not be available; nor does it represent that
it has made any independent effort to identify any such rights.
Information on the procedures with respect to rights in RFC
documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any EMail: tmtalpey@gmail.com
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any Brent Callaghan
copyrights, patents or patent applications, or other proprietary Apple Computer, Inc.
rights that may cover technology that may be required to implement MS: 302-4K
this standard. Please address the information to the IETF at ietf- 2 Infinite Loop
ipr@ietf.org. Cupertino, CA 95014 USA
Acknowledgment EMail: brentc@apple.com
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
 End of changes. 61 change blocks. 
211 lines changed or deleted 156 lines changed or added

This html diff was produced by rfcdiff 1.37b. The latest version is available from http://tools.ietf.org/tools/rfcdiff/