draft-ietf-nfsv4-nfsdirect-07.txt   draft-ietf-nfsv4-nfsdirect-08.txt 
NFSv4 Working Group Tom Talpey NFSv4 Working Group Tom Talpey
Internet-Draft NetApp Internet-Draft NetApp
Intended status: Standards Track Brent Callaghan Intended status: Standards Track Brent Callaghan
Expires: August 23, 2008 Apple Expires: October 17, 2008 Apple
February 22, 2008 April 16, 2008
NFS Direct Data Placement NFS Direct Data Placement
draft-ietf-nfsv4-nfsdirect-07 draft-ietf-nfsv4-nfsdirect-08
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 2, line 13 skipping to change at page 2, line 13
3, 4 and 4.1 over such an RDMA transport. 3, 4 and 4.1 over such an RDMA transport.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Transfers from NFS Client to NFS Server . . . . . . . . . . 2 2. Transfers from NFS Client to NFS Server . . . . . . . . . . 2
3. Transfers from NFS Server to NFS Client . . . . . . . . . . 3 3. Transfers from NFS Server to NFS Client . . . . . . . . . . 3
4. NFS Versions 2 and 3 Mapping . . . . . . . . . . . . . . . . 4 4. NFS Versions 2 and 3 Mapping . . . . . . . . . . . . . . . . 4
5. NFS Version 4 Mapping . . . . . . . . . . . . . . . . . . . 5 5. NFS Version 4 Mapping . . . . . . . . . . . . . . . . . . . 5
5.1. NFS Version 4 Callbacks . . . . . . . . . . . . . . . . . 7 5.1. NFS Version 4 Callbacks . . . . . . . . . . . . . . . . . 7
6. Security Considerations . . . . . . . . . . . . . . . . . . 8 6. Port Usage Considerations . . . . . . . . . . . . . . . . . 8
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . 8 7. Security Considerations . . . . . . . . . . . . . . . . . . 8
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . 8
9. Normative References . . . . . . . . . . . . . . . . . . . . 9 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 9
10. Informative References . . . . . . . . . . . . . . . . . 10 10. Normative References . . . . . . . . . . . . . . . . . . . 9
11. Authors' Addresses . . . . . . . . . . . . . . . . . . . 10 11. Informative References . . . . . . . . . . . . . . . . . 10
12. Intellectual Property and Copyright Statements . . . . . 10 12. Authors' Addresses . . . . . . . . . . . . . . . . . . . 10
13. Intellectual Property and Copyright Statements . . . . . 11
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . 11 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . 11
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
1. Introduction 1. Introduction
skipping to change at page 2, line 47 skipping to change at page 2, line 48
and server must agree on a consistent mapping of posted buffers to and server must agree on a consistent mapping of posted buffers to
RPC. This document details the mapping for each version of the NFS RPC. This document details the mapping for each version of the NFS
protocol [RFC1094] [RFC1813] [RFC3530] [NFSv4.1]. protocol [RFC1094] [RFC1813] [RFC3530] [NFSv4.1].
2. Transfers from NFS Client to NFS Server 2. Transfers from NFS Client to NFS Server
The RDMA Read list, in the RDMA transport header, allows an RPC The RDMA Read list, in the RDMA transport header, allows an RPC
client to marshal RPC call data selectively. Large chunks of data, client to marshal RPC call data selectively. Large chunks of data,
such as the file data of an NFS WRITE request, MAY be referenced by such as the file data of an NFS WRITE request, MAY be referenced by
an RDMA Read list and be moved efficiently and directly-placed by an an RDMA Read list and be moved efficiently and directly-placed by an
RDMA READ operation initiated by the server. RDMA Read operation initiated by the server.
The process of identifying these chunks for the RDMA Read list can be The process of identifying these chunks for the RDMA Read list can be
implemented entirely within the RPC layer. It is transparent to the implemented entirely within the RPC layer. It is transparent to the
upper-level protocol, such as NFS. For instance, the file data upper-level protocol, such as NFS. For instance, the file data
portion of an NFS WRITE request can be selected as an RDMA "chunk" portion of an NFS WRITE request can be selected as an RDMA "chunk"
within the XDR marshaling code of RPC based on a size criterion, within the eXternal Data Representation (XDR) marshaling code of RPC
independently of the NFS protocol layer. The XDR unmarshaling on the based on a size criterion, independently of the NFS protocol layer.
receiving system can identify the correspondence between Read chunks The XDR unmarshaling on the receiving system can identify the
and protocol elements via the XDR position value encoded in the Read correspondence between Read chunks and protocol elements via the XDR
chunk entry. position value encoded in the Read chunk entry.
RPC RDMA Read chunks are employed by this NFS mapping to convey RPC RDMA Read chunks are employed by this NFS mapping to convey
specific NFS data to the server in a manner which may be directly specific NFS data to the server in a manner which may be directly
placed. The following sections describe this mapping for versions of placed. The following sections describe this mapping for versions of
the NFS protocol. the NFS protocol.
3. Transfers from NFS Server to NFS Client 3. Transfers from NFS Server to NFS Client
The RDMA Write list, in the RDMA transport header, allows the client The RDMA Write list, in the RDMA transport header, allows the client
to post one or more buffers into which the server will RDMA Write to post one or more buffers into which the server will RDMA Write
designated result chunks directly. If the client sends a null write designated result chunks directly. If the client sends a null Write
list, then results from the RPC call will be returned as either an list, then results from the RPC call will be returned as either an
inline reply, as chunks in an RDMA Read list of server-posted inline reply, as chunks in an RDMA Read list of server-posted
buffers, or in a client-posted reply buffer. buffers, or in a client-posted reply buffer.
Each posted buffer in a Write list is represented as an array of Each posted buffer in a Write list is represented as an array of
memory segments. This allows the client some flexibility in memory segments. This allows the client some flexibility in
submitting discontiguous memory segments into which the server will submitting discontiguous memory segments into which the server will
scatter the result. Each segment is described by a triplet scatter the result. Each segment is described by a triplet
consisting of the segment handle or steering tag (STag), segment consisting of the segment handle or steering tag (STag), segment
length, and memory address or offset. length, and memory address or offset.
skipping to change at page 4, line 11 skipping to change at page 4, line 12
which MUST be large enough to accept the result. If the buffer is which MUST be large enough to accept the result. If the buffer is
too small, the server MUST return an XDR encode error. The server too small, the server MUST return an XDR encode error. The server
MUST return the result data for a posted buffer by progressively MUST return the result data for a posted buffer by progressively
filling its segments, perhaps leaving some trailing segments unfilled filling its segments, perhaps leaving some trailing segments unfilled
or partially full if the size of the result is less than the total or partially full if the size of the result is less than the total
size of the buffer segments. size of the buffer segments.
The server returns the RDMA Write list to the client with the segment The server returns the RDMA Write list to the client with the segment
length fields overwritten to indicate the amount of data RDMA Written length fields overwritten to indicate the amount of data RDMA Written
to each segment. Results returned by direct placement MUST NOT be to each segment. Results returned by direct placement MUST NOT be
returned by other methods, e.g., by read chunk list or inline. If no returned by other methods, e.g., by Read chunk list or inline. If no
result data at all is returned for the element, the server places no result data at all is returned for the element, the server places no
data in the buffer(s), but does return zeroes in the segment length data in the buffer(s), but does return zeroes in the segment length
fields corresponding to the result. fields corresponding to the result.
The RDMA Write list allows the client to provide multiple result The RDMA Write list allows the client to provide multiple result
buffers - each buffer maps to a specific result in the reply. The NFS buffers - each buffer maps to a specific result in the reply. The
client and server implementations agree by specifying the mapping of NFS client and server implementations agree by specifying the mapping
results to buffers for each RPC procedure. The following sections of results to buffers for each RPC procedure. The following sections
describe this mapping for versions of the NFS protocol. describe this mapping for versions of the NFS protocol.
Through the use of RDMA Write lists in NFS requests, it is not Through the use of RDMA Write lists in NFS requests, it is not
necessary to employ the RDMA Read lists in the NFS replies, as necessary to employ the RDMA Read lists in the NFS replies, as
described in the RPC/RDMA protocol. This enables more efficient described in the RPC/RDMA protocol. This enables more efficient
operation, by avoiding the need for the server to expose buffers for operation, by avoiding the need for the server to expose buffers for
RDMA, and also avoiding "RDMA_DONE" exchanges. Clients MAY RDMA, and also avoiding "RDMA_DONE" exchanges. Clients MAY
additionally employ RDMA Reply chunks to receive entire messages, as additionally employ RDMA Reply chunks to receive entire messages, as
described in [RPCRDMA]. described in [RPCRDMA].
skipping to change at page 5, line 37 skipping to change at page 5, line 38
sizes. sizes.
Flow control is handled dynamically by the RPC RDMA protocol, and Flow control is handled dynamically by the RPC RDMA protocol, and
write padding is OPTIONAL and therefore MAY remain unused. write padding is OPTIONAL and therefore MAY remain unused.
Alternatively, if the server is administratively configured to values Alternatively, if the server is administratively configured to values
appropriate for all its clients, the same assurance of appropriate for all its clients, the same assurance of
interoperability within the domain can be made. interoperability within the domain can be made.
The use of a configuration protocol with NFS v2 and v3 is therefore The use of a configuration protocol with NFS v2 and v3 is therefore
OPTIONAL. Employing a configuration exchange may allow some advantage OPTIONAL. Employing a configuration exchange may allow some
to server resource management through accurately sizing buffers, advantage to server resource management through accurately sizing
enabling the server to know exactly how many RDMA Reads may be in buffers, enabling the server to know exactly how many RDMA Reads may
progress at once on the client connection, and enabling client write be in progress at once on the client connection, and enabling client
padding which may be desirable for certain servers when RDMA Read is write padding which may be desirable for certain servers when RDMA
impractical. Read is impractical.
5. NFS Version 4 Mapping 5. NFS Version 4 Mapping
This specification applies to the first minor version of NFS version This specification applies to the first minor version of NFS version
4 (NFSv4.0) and any subsequent minor versions that do not override 4 (NFSv4.0) and any subsequent minor versions that do not override
this mapping. this mapping.
The Write list MUST be considered only for the COMPOUND procedure. The Write list MUST be considered only for the COMPOUND procedure.
This procedure returns results from a sequence of operations. Only This procedure returns results from a sequence of operations. Only
the opaque file data from an NFS READ operation, and the pathname the opaque file data from an NFS READ operation, and the pathname
from a READLINK operation MUST utilize entries from the Write list. from a READLINK operation MUST utilize entries from the Write list.
If there is no Write list, i.e., the list is null, then any READ or If there is no Write list, i.e., the list is null, then any READ or
READLINK operations in the COMPOUND MUST return their data inline. READLINK operations in the COMPOUND MUST return their data inline.
The NFSv4.0 client MUST ensure in this case that any result of its The NFSv4.0 client MUST ensure in this case that any result of its
READ and READLINK requests will fit within its receive buffers, in READ and READLINK requests will fit within its receive buffers, in
order to avoid a resulting RDMA transport error upon transfer. The order to avoid a resulting RDMA transport error upon transfer. The
server is not required to detect this. server is not required to detect this.
skipping to change at page 6, line 23 skipping to change at page 6, line 24
order to avoid a resulting RDMA transport error upon transfer. The order to avoid a resulting RDMA transport error upon transfer. The
server is not required to detect this. server is not required to detect this.
The first entry in the Write list MUST be used by the first READ or The first entry in the Write list MUST be used by the first READ or
READLINK in the COMPOUND request. The next Write list entry by the READLINK in the COMPOUND request. The next Write list entry by the
by the next READ or READLINK, and so on. If there are more READ or by the next READ or READLINK, and so on. If there are more READ or
READLINK operations than Write list entries, then any remaining READLINK operations than Write list entries, then any remaining
operations MUST return their results inline. operations MUST return their results inline.
If a Write list entry is presented, then the corresponding READ or If a Write list entry is presented, then the corresponding READ or
READLINK MUST return its data via an RDMA WRITE to the buffer READLINK MUST return its data via an RDMA Write to the buffer
indicated by the Write list entry. If the Write list entry has zero indicated by the Write list entry. If the Write list entry has zero
RDMA segments, or if the total size of the segments is zero, then the RDMA segments, or if the total size of the segments is zero, then the
corresponding READ or READLINK operation MUST return its result corresponding READ or READLINK operation MUST return its result
inline. inline.
The following example shows an RDMA Write list with three posted The following example shows an RDMA Write list with three posted
buffers A, B, and C. The designated operations in the compound buffers A, B, and C. The designated operations in the compound
request, READ and READLINK, consume the posted buffers by writing request, READ and READLINK, consume the posted buffers by writing
their results back to each buffer. their results back to each buffer.
skipping to change at page 7, line 44 skipping to change at page 7, line 44
sizes to better manage the server's response cache. An extension to sizes to better manage the server's response cache. An extension to
NFS version 4 supporting a more comprehensive exchange of upper layer NFS version 4 supporting a more comprehensive exchange of upper layer
parameters is part of [NFSv4.1]. parameters is part of [NFSv4.1].
5.1. NFS Version 4 Callbacks 5.1. NFS Version 4 Callbacks
The NFS version 4 protocols support server-initiated callbacks to The NFS version 4 protocols support server-initiated callbacks to
selected clients, in order to notify them of events such as recalled selected clients, in order to notify them of events such as recalled
delegations, etc. These callbacks present no particular issue to delegations, etc. These callbacks present no particular issue to
being framed over RPC/RDMA, since such callbacks do not carry bulk being framed over RPC/RDMA, since such callbacks do not carry bulk
data such as read or write. They MAY be transmitted inline via data such as NFS READ or NFS WRITE. They MAY be transmitted inline
RDMA_MSG, or if the callback message or its reply overflow the via RDMA_MSG, or if the callback message or its reply overflow the
negotiated buffer sizes for a callback connection, they MAY be negotiated buffer sizes for a callback connection, they MAY be
transferred via the RDMA_NOMSG method as described above for other transferred via the RDMA_NOMSG method as described above for other
exchanges. exchanges.
One special case is noteworthy: in NFS version 4.1, the callback One special case is noteworthy: in NFS version 4.1, the callback
channel is optionally negotiated to be on the same connection as one channel is optionally negotiated to be on the same connection as one
used for client requests. In this case, and because the XID is used for client requests. In this case, and because the XID is
present in the RPC/RDMA header, the client MUST ascertain whether the present in the RPC/RDMA header, the client MUST ascertain whether the
message is in fact an RPC REPLY, and therefore a reply to a prior message is in fact an RPC REPLY, and therefore a reply to a prior
request and carrying its XID, before processing it as such. By the request and carrying its XID, before processing it as such. By the
skipping to change at page 8, line 19 skipping to change at page 8, line 19
processing the XID. processing the XID.
In the callback case, the XID present in the RPC/RDMA header will In the callback case, the XID present in the RPC/RDMA header will
potentially have any value which may (or may not) collide with an XID potentially have any value which may (or may not) collide with an XID
used by the client for a previous or future request. The client and used by the client for a previous or future request. The client and
server MUST inspect the RPC component of the message to determine its server MUST inspect the RPC component of the message to determine its
potential disposition as either an RPC CALL or RPC REPLY, prior to potential disposition as either an RPC CALL or RPC REPLY, prior to
processing this XID, and MUST NOT reject or accept it without also processing this XID, and MUST NOT reject or accept it without also
determining the proper context. determining the proper context.
6. Security Considerations 6. Port Usage Considerations
The RDMA transport for RPC [RPCRDMA] supports all RPC [RFC1831bis]
security models, including RPCSEC_GSS [RFC2203] security and link-
level security. The choice of RDMA Read and RDMA Write to return RPC
argument and results, respectively, does not affect this, since it
only changes the method of data transfer. Specifically, the
requirements of [RPCRDMA] ensure that this choice does not introduce
new vulnerabilities.
Because this document defines only the binding of the NFS protocols
atop [RPCRDMA], all relevant security considerations are therefore to
be described at that layer.
7. IANA Considerations
NFS use of direct data placement introduces a need for an additional NFS use of direct data placement introduces a need for an additional
NFS port number assignment for networks which share traditional UDP NFS port number assignment for networks which share traditional UDP
and TCP port spaces with RDMA services. The iWARP [RFC5041] and TCP port spaces with RDMA services. The iWARP [RFC5041]
[RFC5040] protocol is such an example (Infiniband is not). [RFC5040] protocol is such an example (Infiniband is not).
NFS servers for versions 2 and 3 [RFC1094] [RFC1813] traditionally NFS servers for versions 2 and 3 [RFC1094] [RFC1813] traditionally
listen for clients on UDP and TCP port 2049, and additionally, they listen for clients on UDP and TCP port 2049, and additionally, they
register these with the portmapper and/or rpcbind [RFC1833] service. register these with the portmapper and/or rpcbind [RFC1833] service.
However, [RFC3530] requires NFS servers for version 4 to listen on However, [RFC3530] requires NFS servers for version 4 to listen on
skipping to change at page 9, line 13 skipping to change at page 8, line 46
portmapper under the netid assigned by the requirement in [RPCRDMA]. portmapper under the netid assigned by the requirement in [RPCRDMA].
An NFS version 4 server supporting RPC/RDMA on such a network MUST An NFS version 4 server supporting RPC/RDMA on such a network MUST
use the alternative well-known port number for its RPC/RDMA service. use the alternative well-known port number for its RPC/RDMA service.
Clients SHOULD connect to this well-known port without consulting the Clients SHOULD connect to this well-known port without consulting the
RPC portmapper (as for NFSv4/TCP). RPC portmapper (as for NFSv4/TCP).
The port number assigned to an NFS service over an RPC/RDMA transport The port number assigned to an NFS service over an RPC/RDMA transport
is available from the IANA port registry [RFC3232]. is available from the IANA port registry [RFC3232].
8. Acknowledgements 7. Security Considerations
The RDMA transport for RPC [RPCRDMA] supports all RPC [RFC1831bis]
security models, including RPCSEC_GSS [RFC2203] security and link-
level security. The choice of RDMA Read and RDMA Write to return RPC
argument and results, respectively, does not affect this, since it
only changes the method of data transfer. Specifically, the
requirements of [RPCRDMA] ensure that this choice does not introduce
new vulnerabilities.
Because this document defines only the binding of the NFS protocols
atop [RPCRDMA], all relevant security considerations are therefore to
be described at that layer.
8. IANA Considerations
This document has no IANA considerations.
9. Acknowledgments
The authors would like to thank Dave Noveck and Chet Juszczak for The authors would like to thank Dave Noveck and Chet Juszczak for
their contributions to this document. their contributions to this document.
9. Normative References 10. Normative References
[RFC2119] [RFC2119]
S. Bradner, "Key words for use in RFCs to Indicate Requirement S. Bradner, "Key words for use in RFCs to Indicate Requirement
Levels", Levels",
Best Current Practice, Best Current Practice,
BCP 14, RFC 2119, March 1997. BCP 14, RFC 2119, March 1997.
[RFC1094] [RFC1094]
"NFS: Network File System Protocol Specification", "NFS: Network File System Protocol Specification",
(NFS version 2) Informational RFC, (NFS version 2) Informational RFC,
skipping to change at page 10, line 13 skipping to change at page 10, line 17
[NFSv4.1] [NFSv4.1]
S. Shepler et al., ed., "NFSv4 Minor Version 1" S. Shepler et al., ed., "NFSv4 Minor Version 1"
Internet Draft Work in Progress, Internet Draft Work in Progress,
draft-ietf-nfsv4-minorversion1 draft-ietf-nfsv4-minorversion1
[RFC2203] [RFC2203]
M. Eisler, A. Chiu, L. Ling, "RPCSEC_GSS Protocol Specification", M. Eisler, A. Chiu, L. Ling, "RPCSEC_GSS Protocol Specification",
Standards Track RFC, Standards Track RFC,
http://www.ietf.org/rfc/rfc2203.txt http://www.ietf.org/rfc/rfc2203.txt
10. Informative References 11. Informative References
[RFC3232] [RFC3232]
Internet Assigned Numbers Authority (IANA), Internet Assigned Numbers Authority (IANA),
Port Registry database, Port Registry database,
http://www.ietf.org/rfc/rfc3232.txt http://www.ietf.org/rfc/rfc3232.txt
http://www.iana.org/assignments/port-numbers http://www.iana.org/assignments/port-numbers
[RPCRDMA] [RPCRDMA]
T. Talpey, B. Callaghan, "Remote Direct Memory Access Transport T. Talpey, B. Callaghan, "Remote Direct Memory Access Transport
for Remote Procedure Call" for Remote Procedure Call"
skipping to change at page 10, line 36 skipping to change at page 10, line 40
[RFC5041] [RFC5041]
H. Shah et al., "Direct Data Placement over Reliable Transports", H. Shah et al., "Direct Data Placement over Reliable Transports",
Standards Track RFC Standards Track RFC
[RFC5040] [RFC5040]
R. Recio et al., "A Remote Direct Memory Access Protocol R. Recio et al., "A Remote Direct Memory Access Protocol
Specification", Specification",
Standards Track RFC Standards Track RFC
11. Authors' Addresses 12. Authors' Addresses
Tom Talpey Tom Talpey
Network Appliance, Inc. Network Appliance, Inc.
1601 Trapelo Road, #16 1601 Trapelo Road, #16
Waltham, MA 02451 USA Waltham, MA 02451 USA
Phone: +1 781 768 5329 Phone: +1 781 768 5329
EMail: thomas.talpey@netapp.com EMail: thomas.talpey@netapp.com
Brent Callaghan Brent Callaghan
Apple Computer, Inc. Apple Computer, Inc.
MS: 302-4K MS: 302-4K
2 Infinite Loop 2 Infinite Loop
Cupertino, CA 95014 USA Cupertino, CA 95014 USA
EMail: brentc@apple.com EMail: brentc@apple.com
12. Intellectual Property and Copyright Statements 13. Intellectual Property and Copyright Statements
Full Copyright Statement Full Copyright Statement
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors contained in BCP 78, and except as set forth therein, the authors
retain all their rights. retain all their rights.
This document and the information contained herein are provided on This document and the information contained herein are provided on
 End of changes. 18 change blocks. 
51 lines changed or deleted 55 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/