draft-ietf-nfsv4-rpcrdma-07.txt   draft-ietf-nfsv4-rpcrdma-08.txt 
NFSv4 Working Group Tom Talpey NFSv4 Working Group Tom Talpey
Internet-Draft NetApp Internet-Draft NetApp
Intended status: Standards Track Brent Callaghan Intended status: Standards Track Brent Callaghan
Expires: August 23, 2008 Apple Expires: October 17, 2008 Apple
February 22, 2008 April 16, 2008
Remote Direct Memory Access Transport for Remote Procedure Call Remote Direct Memory Access Transport for Remote Procedure Call
draft-ietf-nfsv4-rpcrdma-07 draft-ietf-nfsv4-rpcrdma-08
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 43 skipping to change at page 1, line 43
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
A protocol is described providing Remote Direct Memory Access A protocol is described providing Remote Direct Memory Access
(RDMA) as a new transport for Computing Remote Procedure Call (RDMA) as a new transport for Remote Procedure Call (RPC). The
(RPC). The RDMA transport binding conveys the benefits of RDMA transport binding conveys the benefits of efficient, bulk data
efficient, bulk data transport over high speed networks, while transport over high speed networks, while providing for minimal
providing for minimal change to RPC applications and with no change to RPC applications and with no required revision of the
required revision of the application RPC protocol, or the RPC application RPC protocol, or the RPC protocol itself.
protocol itself.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Abstract RDMA Requirements . . . . . . . . . . . . . . . . . 3 2. Abstract RDMA Requirements . . . . . . . . . . . . . . . . . 3
3. Protocol Outline . . . . . . . . . . . . . . . . . . . . . . 4 3. Protocol Outline . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Short Messages . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Short Messages . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Data Chunks . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Data Chunks . . . . . . . . . . . . . . . . . . . . . . . 5
3.3. Flow Control . . . . . . . . . . . . . . . . . . . . . . . 6 3.3. Flow Control . . . . . . . . . . . . . . . . . . . . . . . 6
3.4. XDR Encoding with Chunks . . . . . . . . . . . . . . . . . 7 3.4. XDR Encoding with Chunks . . . . . . . . . . . . . . . . . 7
skipping to change at page 2, line 35 skipping to change at page 2, line 35
5.2. RDMA Write of Long Replies (Reply Chunks) . . . . . . . 24 5.2. RDMA Write of Long Replies (Reply Chunks) . . . . . . . 24
6. Connection Configuration Protocol . . . . . . . . . . . . 25 6. Connection Configuration Protocol . . . . . . . . . . . . 25
6.1. Initial Connection State . . . . . . . . . . . . . . . . 26 6.1. Initial Connection State . . . . . . . . . . . . . . . . 26
6.2. Protocol Description . . . . . . . . . . . . . . . . . . 26 6.2. Protocol Description . . . . . . . . . . . . . . . . . . 26
7. Memory Registration Overhead . . . . . . . . . . . . . . . 28 7. Memory Registration Overhead . . . . . . . . . . . . . . . 28
8. Errors and Error Recovery . . . . . . . . . . . . . . . . 28 8. Errors and Error Recovery . . . . . . . . . . . . . . . . 28
9. Node Addressing . . . . . . . . . . . . . . . . . . . . . 28 9. Node Addressing . . . . . . . . . . . . . . . . . . . . . 28
10. RPC Binding . . . . . . . . . . . . . . . . . . . . . . . 29 10. RPC Binding . . . . . . . . . . . . . . . . . . . . . . . 29
11. Security Considerations . . . . . . . . . . . . . . . . . 30 11. Security Considerations . . . . . . . . . . . . . . . . . 30
12. IANA Considerations . . . . . . . . . . . . . . . . . . . 31 12. IANA Considerations . . . . . . . . . . . . . . . . . . . 31
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . 32 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . 32
14. Normative References . . . . . . . . . . . . . . . . . . 32 14. Normative References . . . . . . . . . . . . . . . . . . 32
15. Informative References . . . . . . . . . . . . . . . . . 33 15. Informative References . . . . . . . . . . . . . . . . . 33
16. Authors' Addresses . . . . . . . . . . . . . . . . . . . 34 16. Authors' Addresses . . . . . . . . . . . . . . . . . . . 34
17. Intellectual Property and Copyright Statements . . . . . 35 17. Intellectual Property and Copyright Statements . . . . . 35
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . 36 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . 36
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
skipping to change at page 5, line 20 skipping to change at page 5, line 20
transports such as iWARP [RFC5040, RFC5041] or Infiniband [IB]. transports such as iWARP [RFC5040, RFC5041] or Infiniband [IB].
3. Protocol Outline 3. Protocol Outline
An RPC message can be conveyed in identical fashion, whether it is An RPC message can be conveyed in identical fashion, whether it is
a call or reply message. In each case, the transmission of the a call or reply message. In each case, the transmission of the
message proper is preceded by transmission of a transport-specific message proper is preceded by transmission of a transport-specific
header for use by RPC over RDMA transports. This header is header for use by RPC over RDMA transports. This header is
analogous to the record marking used for RPC over TCP, but is more analogous to the record marking used for RPC over TCP, but is more
extensive, since RDMA transports support several modes of data extensive, since RDMA transports support several modes of data
transfer and it is important to allow the client and server to use transfer and it is important to allow the upper layer protocol to
the most efficient mode for any given transfer. Multiple segments specify the most efficient mode for each of the segments in a
of a message may be transferred in different ways to different message. Multiple segments of a message may thereby be transferred
remote memory destinations. in different ways to different remote memory destinations.
All transfers of a call or reply begin with an RDMA Send which All transfers of a call or reply begin with an RDMA Send which
transfers at least the RPC over RDMA header, usually with the call transfers at least the RPC over RDMA header, usually with the call
or reply message appended, or at least some part thereof. Because or reply message appended, or at least some part thereof. Because
the size of what may be transmitted via RDMA Send is limited by the the size of what may be transmitted via RDMA Send is limited by the
size of the receiver's pre-posted buffer, the RPC over RDMA size of the receiver's pre-posted buffer, the RPC over RDMA
transport provides a number of methods to reduce the amount transport provides a number of methods to reduce the amount
transferred by means of the RDMA Send, when necessary, by transferred by means of the RDMA Send, when necessary, by
transferring various parts of the message using RDMA Read and RDMA transferring various parts of the message using RDMA Read and RDMA
Write. Write.
RPC over RDMA framing replaces all other RPC framing (such as TCP RPC over RDMA framing replaces all other RPC framing (such as TCP
record marking) when used atop an RPC/RDMA association, even though record marking) when used atop an RPC/RDMA association, even though
the underlying RDMA protocol may itself be layered atop a protocol the underlying RDMA protocol may itself be layered atop a protocol
with a defined RPC framing (such as TCP). An upper layer may with a defined RPC framing (such as TCP). It is however possible
however define an exchange to dynamically enable RPC/RDMA on an for RPC/RDMA to be dynamically enabled, in the course of
existing RPC association. Any such exchange must be carefully negotiating the use of RDMA via an upper layer exchange. Because
architected so as to prevent any ambiguity as to the framing in use RPC framing delimits an entire RPC request or reply, the resulting
for each side of the connection. Because RPC/RDMA framing delimits shift in framing must occur between distinct RPC messages, and in
an entire RPC request or reply, any such shift must occur between concert with the transport.
distinct RPC messages.
3.1. Short Messages 3.1. Short Messages
Many RPC messages are quite short. For example, the NFS version 3 Many RPC messages are quite short. For example, the NFS version 3
GETATTR request, is only 56 bytes: 20 bytes of RPC header, plus a GETATTR request, is only 56 bytes: 20 bytes of RPC header, plus a
32 byte file handle argument and 4 bytes of length. The reply to 32 byte file handle argument and 4 bytes of length. The reply to
this common request is about 100 bytes. this common request is about 100 bytes.
There is no benefit in transferring such small messages with an There is no benefit in transferring such small messages with an
RDMA Read or Write operation. The overhead in transferring RDMA Read or Write operation. The overhead in transferring
skipping to change at page 32, line 5 skipping to change at page 32, line 5
their security models. It is REQUIRED that any RDMA provider used their security models. It is REQUIRED that any RDMA provider used
for RPC transport be conformant to the requirements of [RFC5042] in for RPC transport be conformant to the requirements of [RFC5042] in
order to satisfy these protections. order to satisfy these protections.
Once delivered securely by the RDMA provider, any RDMA-exposed Once delivered securely by the RDMA provider, any RDMA-exposed
addresses will contain only RPC payloads in the chunk lists, addresses will contain only RPC payloads in the chunk lists,
transferred under the protection of RPCSEC_GSS integrity and transferred under the protection of RPCSEC_GSS integrity and
privacy. By these means, the data will be protected end-to-end, as privacy. By these means, the data will be protected end-to-end, as
required by the RPC layer security model. required by the RPC layer security model.
Where results are supplied to the requester via Read chunks, a Where upper layer protocols choose to supply results to the
server resource deficit can arise if the client does not promptly requester via Read chunks, a server resource deficit can arise if
acknowledge their status via the RDMA_DONE message. This can the client does not promptly acknowledge their status via the
potentially lead to a denial of service situation, with a single RDMA_DONE message. This can potentially lead to a denial of
client unfairly (and unnecessarily) consuming server RDMA service situation, with a single client unfairly (and
resources. Servers MUST protect against this situation, unnecessarily) consuming server RDMA resources. Servers for such
upper layer protocols MUST protect against this situation,
originating from one or many clients. For example, a time-based originating from one or many clients. For example, a time-based
window of buffer availability may be offered, if the client fails window of buffer availability may be offered, if the client fails
to obtain the data within the window, it will simply retry using to obtain the data within the window, it will simply retry using
ordinary RPC retry semantics. Or, a more severe method would be ordinary RPC retry semantics. Or, a more severe method would be
for the server to simply close the client's RDMA connection, for the server to simply close the client's RDMA connection,
freeing the RDMA resources and allowing the server to reclaim them. freeing the RDMA resources and allowing the server to reclaim them.
A fairer and more useful method is provided by the protocol itself. A fairer and more useful method is provided by the protocol itself.
The server MAY use the rdma_credit value to limit the number of The server MAY use the rdma_credit value to limit the number of
outstanding requests for each client. By including the number of outstanding requests for each client. By including the number of
skipping to change at page 33, line 29 skipping to change at page 33, line 29
rdmaconfig 100400 rpc.rdmaconfig rdmaconfig 100400 rpc.rdmaconfig
Currently, neither the nc_proto netid's nor the RPC program numbers Currently, neither the nc_proto netid's nor the RPC program numbers
are are assigned by IANA. The list in [RFC1833] has served as the are are assigned by IANA. The list in [RFC1833] has served as the
netid registry, and the republication declared in [IANA-RPC] has netid registry, and the republication declared in [IANA-RPC] has
served as the program number registry. Ideally, IANA will create served as the program number registry. Ideally, IANA will create
explicit registries for these objects. However, in the absence of explicit registries for these objects. However, in the absence of
new registries, this document would serve as the repository for the new registries, this document would serve as the repository for the
RPC program number assignment, and the protocol netid. RPC program number assignment, and the protocol netid.
13. Acknowledgements 13. Acknowledgments
The authors wish to thank Rob Thurlow, John Howard, Chet Juszczak, The authors wish to thank Rob Thurlow, John Howard, Chet Juszczak,
Alex Chiu, Peter Staubach, Dave Noveck, Brian Pawlowski, Steve Alex Chiu, Peter Staubach, Dave Noveck, Brian Pawlowski, Steve
Kleiman, Mike Eisler, Mark Wittle, Shantanu Mehendale, David Kleiman, Mike Eisler, Mark Wittle, Shantanu Mehendale, David
Robinson and Mallikarjun Chadalapaka for their contributions to Robinson and Mallikarjun Chadalapaka for their contributions to
this document. this document.
14. Normative References 14. Normative References
[RFC2119] [RFC2119]
 End of changes. 8 change blocks. 
28 lines changed or deleted 27 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/