draft-ietf-nfsv4-rfc5666-implementation-experience-01.txt   draft-ietf-nfsv4-rfc5666-implementation-experience-02.txt 
Network File System Version 4 C. Lever Network File System Version 4 C. Lever
Internet-Draft Oracle Internet-Draft Oracle
Intended status: Informational February 23, 2016 Intended status: Informational April 8, 2016
Expires: August 26, 2016 Expires: October 10, 2016
RPC-over-RDMA Version One Implementation Experience RPC-over-RDMA Version One Implementation Experience
draft-ietf-nfsv4-rfc5666-implementation-experience-01 draft-ietf-nfsv4-rfc5666-implementation-experience-02
Abstract Abstract
This document details experiences and challenges implementing the This document details experiences and challenges implementing the
RPC-over-RDMA Version One protocol. Specification changes are RPC-over-RDMA Version One protocol. Specification changes are
recommended to address avoidable interoperability failures. recommended to address avoidable interoperability failures.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
skipping to change at page 1, line 32 skipping to change at page 1, line 32
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 26, 2016. This Internet-Draft will expire on October 10, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 1.1. Purpose Of This Document . . . . . . . . . . . . . . . . 3
1.2. Purpose Of This Document . . . . . . . . . . . . . . . . 3 1.2. Updating RFC 5666 . . . . . . . . . . . . . . . . . . . . 3
1.3. Updating RFC 5666 . . . . . . . . . . . . . . . . . . . . 3 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 4
2. RPC-Over-RDMA Essentials . . . . . . . . . . . . . . . . . . 4 2. RPC-Over-RDMA Essentials . . . . . . . . . . . . . . . . . . 4
2.1. Arguments And Results . . . . . . . . . . . . . . . . . . 4 2.1. Arguments And Results . . . . . . . . . . . . . . . . . . 4
2.2. Remote Direct Memory Access . . . . . . . . . . . . . . . 5 2.2. Remote Direct Memory Access . . . . . . . . . . . . . . . 5
2.3. Transfer Models . . . . . . . . . . . . . . . . . . . . . 6 2.3. Transfer Models . . . . . . . . . . . . . . . . . . . . . 6
2.4. Upper Layer Binding Specifications . . . . . . . . . . . 7 2.4. Upper Layer Binding Specifications . . . . . . . . . . . 7
2.5. On-The-Wire Protocol . . . . . . . . . . . . . . . . . . 8 2.5. On-The-Wire Protocol . . . . . . . . . . . . . . . . . . 8
3. Specification Issues . . . . . . . . . . . . . . . . . . . . 14 3. Specification Issues . . . . . . . . . . . . . . . . . . . . 14
3.1. Extensibility Considerations . . . . . . . . . . . . . . 14 3.1. Extensibility Considerations . . . . . . . . . . . . . . 14
3.2. XDR Clarifications . . . . . . . . . . . . . . . . . . . 15 3.2. XDR Clarifications . . . . . . . . . . . . . . . . . . . 15
3.3. The Position Zero Read Chunk . . . . . . . . . . . . . . 18 3.3. Additional XDR Issues . . . . . . . . . . . . . . . . . . 18
3.4. RDMA_NOMSG Call Messages . . . . . . . . . . . . . . . . 20 3.4. The Position Zero Read Chunk . . . . . . . . . . . . . . 19
3.5. RDMA_MSG Call with Position Zero Read Chunk . . . . . . . 21 3.5. RDMA_NOMSG Call Messages . . . . . . . . . . . . . . . . 21
3.6. Padding Inline Content After A Chunk . . . . . . . . . . 22 3.6. RDMA_MSG Call with Position Zero Read Chunk . . . . . . . 22
3.7. Write Chunk XDR Roundup . . . . . . . . . . . . . . . . . 24 3.7. Padding Inline Content After A Chunk . . . . . . . . . . 23
3.8. Write List Error Cases . . . . . . . . . . . . . . . . . 26 3.8. Write Chunk XDR Roundup . . . . . . . . . . . . . . . . . 25
4. Operational Considerations . . . . . . . . . . . . . . . . . 29 3.9. Write List Error Cases . . . . . . . . . . . . . . . . . 27
4.1. Computing Request Buffer Requirements . . . . . . . . . . 29 4. Operational Considerations . . . . . . . . . . . . . . . . . 30
4.2. Default Inline Buffer Size . . . . . . . . . . . . . . . 30 4.1. Computing Request Buffer Requirements . . . . . . . . . . 30
4.3. When To Use Reply Chunks . . . . . . . . . . . . . . . . 30 4.2. Default Inline Buffer Size . . . . . . . . . . . . . . . 31
4.4. Computing Credit Values . . . . . . . . . . . . . . . . . 31 4.3. When To Use Reply Chunks . . . . . . . . . . . . . . . . 31
4.5. Race Windows . . . . . . . . . . . . . . . . . . . . . . 32 4.4. Computing Credit Values . . . . . . . . . . . . . . . . . 32
5. Pre-requisites For NFSv4 . . . . . . . . . . . . . . . . . . 32 4.5. Race Windows . . . . . . . . . . . . . . . . . . . . . . 33
5.1. Bi-directional Operation . . . . . . . . . . . . . . . . 32 4.6. Detection Of Unsupported Protocol Versions . . . . . . . 33
6. Considerations For Upper Layer Binding Specifications . . . . 33 5. Pre-requisites For NFSv4 . . . . . . . . . . . . . . . . . . 34
6.1. Organization Of Binding Specification Requirements . . . 33 5.1. Bi-directional Operation . . . . . . . . . . . . . . . . 34
6.2. RDMA-Eligibility . . . . . . . . . . . . . . . . . . . . 34 6. Considerations For Upper Layer Binding Specifications . . . . 35
6.3. Inline Threshold Requirements . . . . . . . . . . . . . . 35 6.1. Organization Of Binding Specification Requirements . . . 35
6.4. Violations Of Binding Rules . . . . . . . . . . . . . . . 36 6.2. RDMA-Eligibility . . . . . . . . . . . . . . . . . . . . 35
6.5. Binding Specification Completion Assessment . . . . . . . 37 6.3. Inline Threshold Requirements . . . . . . . . . . . . . . 37
7. Unimplemented Protocol Features . . . . . . . . . . . . . . . 38 6.4. Violations Of Binding Rules . . . . . . . . . . . . . . . 38
7.1. Unimplemented Features To Be Removed . . . . . . . . . . 38 6.5. Binding Specification Completion Assessment . . . . . . . 39
7.2. Unimplemented Features To Be Retained . . . . . . . . . . 39 7. Unimplemented Protocol Features . . . . . . . . . . . . . . . 39
8. Security Considerations . . . . . . . . . . . . . . . . . . . 41 7.1. Unimplemented Features To Be Removed . . . . . . . . . . 39
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 7.2. Unimplemented Features To Be Retained . . . . . . . . . . 41
10. Appendix A: XDR Language Description . . . . . . . . . . . . 42 8. Security Considerations . . . . . . . . . . . . . . . . . . . 43
11. Appendix B: Binding Requirement Summary . . . . . . . . . . . 45 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 43
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 46 10. Appendix A: XDR Language Description . . . . . . . . . . . . 43
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 46 11. Appendix B: Binding Requirement Summary . . . . . . . . . . . 46
13.1. Normative References . . . . . . . . . . . . . . . . . . 46 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 48
13.2. Informative References . . . . . . . . . . . . . . . . . 48 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 48
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 48 13.1. Normative References . . . . . . . . . . . . . . . . . . 48
13.2. Informative References . . . . . . . . . . . . . . . . . 49
1. Introduction
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 49
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC2119].
1.2. Purpose Of This Document 1. Introduction
This document summarizes implementation experience with the RPC-over- This document summarizes implementation experience with the RPC-over-
RDMA Version One protocol [RFC5666], and proposes improvements to the RDMA Version One protocol [RFC5666], and proposes improvements to the
protocol specification based on implementer experience, frequently- protocol specification based on implementer experience, frequently-
asked questions, and interviews with a co-author of RFC 5666. asked questions, and interviews with a co-author of RFC 5666.
1.1. Purpose Of This Document
A key contribution of this document is to highlight areas of RFC 5666 A key contribution of this document is to highlight areas of RFC 5666
where independent good faith readings could result in distinct where independent good faith readings could result in distinct
implementations that do not interoperate with each other. Correcting implementations that do not interoperate with each other. Correcting
these specification issues is critical: fresh implementations of RPC- these specification issues is critical: fresh implementations of RPC-
over-RDMA Version One continue to arise. over-RDMA Version One continue to arise.
Recommendations are limited to the following areas: Recommendations are limited to the following areas:
o Repairing specification ambiguities o Repairing specification ambiguities
o Codifying successful implementation practices and conventions o Codifying successful implementation practices and conventions
o Clarifying the role of Upper Layer Binding specifications o Clarifying the role of Upper Layer Binding specifications
o Exploring protocol enhancements that might be added while allowing o Exploring protocol enhancements that might be added while allowing
extant implementations to interoperate with enhanced extant implementations to interoperate with enhanced
implementations implementations
1.3. Updating RFC 5666 1.2. Updating RFC 5666
During IETF 92, several alternatives for updating RFC 5666 were During IETF 92, several alternatives for updating RFC 5666 were
discussed with the RFC Editor and with the assembled members of the discussed with the RFC Editor and with the assembled members of the
nfsv4 Working Group. Among them were: nfsv4 Working Group. Among them were:
o Filing individual errata for each issue o Filing individual errata for each issue
o Introducing a new RFC that updates but does not obsolete RFC 5666, o Introducing a new RFC that updates but does not obsolete RFC 5666,
but makes no change to the protocol but makes no change to the protocol
skipping to change at page 4, line 22 skipping to change at page 4, line 16
update and obsolete RFC 5666 while retaining a high degree of update and obsolete RFC 5666 while retaining a high degree of
interoperability with current RPC-over-RDMA Version One interoperability with current RPC-over-RDMA Version One
implementations. This approach would avoid changes to on-the-wire implementations. This approach would avoid changes to on-the-wire
behavior without burdening implementers, who could continue to behavior without burdening implementers, who could continue to
reference a single specification of the protocol. In addition, this reference a single specification of the protocol. In addition, this
alternative extends the life of current interoperable RPC-over-RDMA alternative extends the life of current interoperable RPC-over-RDMA
Version One implementations in the field. Version One implementations in the field.
Subsequent discussion within the nfsv4 Working Group has focused on Subsequent discussion within the nfsv4 Working Group has focused on
resolving specification ambiguities that make the construction of resolving specification ambiguities that make the construction of
interoperable implementations unduly difficult. A Version Two of interoperable implementations unduly difficult. Subsequent Versions
RPC-over-RDMA, where deeper changes can be made and new functionality of RPC-over-RDMA, where deeper changes can be made and new
introduced, remains a possibility. functionality introduced, remain a possibility.
1.3. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC2119].
2. RPC-Over-RDMA Essentials 2. RPC-Over-RDMA Essentials
The following sections summarize the state of affairs defined in RFC The following sections summarize the state of affairs defined in RFC
5666. This is a distillation of text from RFC 5666, dialog with a 5666. This is a distillation of text from RFC 5666, dialog with a
co-author of RFC 5666, and implementer experience. The XDR co-author of RFC 5666, and implementer experience. The XDR
definitions are copied from RFC 5666 Section 4.3. definitions are copied from RFC 5666 Section 4.3.
2.1. Arguments And Results 2.1. Arguments And Results
skipping to change at page 18, line 44 skipping to change at page 18, line 44
rpc_rdma_header does not comprise the entire RPC-over-RDMA header, it rpc_rdma_header does not comprise the entire RPC-over-RDMA header, it
should be renamed rpcrdma1_chunks to avoid confusion. should be renamed rpcrdma1_chunks to avoid confusion.
XDR definitions should be enclosed in CODE BEGINS and CODE ENDS XDR definitions should be enclosed in CODE BEGINS and CODE ENDS
delimiters. An appropriate copyright block should accompany the XDR delimiters. An appropriate copyright block should accompany the XDR
definitions in RFC 5666bis. An XDR extraction shell script should be definitions in RFC 5666bis. An XDR extraction shell script should be
provided in the text. provided in the text.
See Section 10 for a full listing of the proposed XDR definitions. See Section 10 for a full listing of the proposed XDR definitions.
3.3. The Position Zero Read Chunk 3.3. Additional XDR Issues
3.3.1. Mechanical Issues
There are some mechanical problems with the XDR language definition
of RPC-over-RDMA Version One provided in Section 4.3 of [RFC5666]:
o No copyright boilerplate is provided
o An extraction script is not provided, and there is no escape
sequence around the code
o There is at least one XDR definition error that prevents the
extracted XDR from compiling
3.3.2. XDR Definition Recursiveness
The usual practice when defining an XDR-based protocol is that there
is one encompassing data type that represents one message in the
protocol.
This is not true for RPC-over-RDMA. The header is defined by one
data type (struct rdma_msg) but the RPC message payload is not
formally represented in the XDR definition in Section 4.3. The
presence or absence of the RPC message payload is indicated by the
message type, and the body of that payload is noted only with a code
comment.
3.3.3. Recommendations
The XDR presented in RFC5666bis should correct the deficiencies
described above.
To correct the lack of formal recursiveness issue without forcing an
on-the-wire behavior change, RFC5666bis should place the RPC-over-
RDMA header and the RPC message payload in separate XDR streams.
3.4. The Position Zero Read Chunk
RFC 5666 Section 5.1 defines the operation of the Position Zero read RFC 5666 Section 5.1 defines the operation of the Position Zero read
chunk. A requester uses the Position Zero read chunk in place of chunk. A requester uses the Position Zero read chunk in place of
inline content. A requester is required to use the Position Zero inline content. A requester is required to use the Position Zero
read chunk when the total size of an RPC call message exceeds the read chunk when the total size of an RPC call message exceeds the
size of the responder's receive buffers, and RDMA-eligible data has size of the responder's receive buffers, and RDMA-eligible data has
already been removed from the message. already been removed from the message.
RFC 5666 Section 3.4 says: RFC 5666 Section 3.4 says:
skipping to change at page 20, line 20 skipping to change at page 21, line 10
read segment in Position Zero would limit the maximum size of RPC- read segment in Position Zero would limit the maximum size of RPC-
over-RDMA messages to a single page. Allowing multiple read over-RDMA messages to a single page. Allowing multiple read
segments means the message size can be as large as the maximum segments means the message size can be as large as the maximum
number of read chunks that can be sent in an RPC-over-RDMA header. number of read chunks that can be sent in an RPC-over-RDMA header.
RFC 5666 does not limit the number of read segments in a read chunk, RFC 5666 does not limit the number of read segments in a read chunk,
nor does it limit the number of chunks that can appear in the Read nor does it limit the number of chunks that can appear in the Read
list. The Position Zero read chunk, despite its name, is not limited list. The Position Zero read chunk, despite its name, is not limited
to a single xdr_read_chunk. to a single xdr_read_chunk.
3.3.1. Recommendations 3.4.1. Recommendations
RFC 5666bis should state that the guidelines in RFC 5666 Section 3.4 RFC 5666bis should state that the guidelines in RFC 5666 Section 3.4
apply only to RDMA_MSG type calls. When the Position Zero read chunk apply only to RDMA_MSG type calls. When the Position Zero read chunk
is introduced in RFC 5666 Section 5.1, enumerate the differences is introduced in RFC 5666 Section 5.1, enumerate the differences
between it and the read chunks previously described in RFC 5666 between it and the read chunks previously described in RFC 5666
Section 3.4. Section 3.4.
RFC 5666bis should describe what restrictions an Upper Layer Binding RFC 5666bis should describe what restrictions an Upper Layer Binding
may make on Position Zero read chunks. may make on Position Zero read chunks.
3.4. RDMA_NOMSG Call Messages 3.5. RDMA_NOMSG Call Messages
The second paragraph of RFC 5667 Section 4 says, in reference to The second paragraph of RFC 5667 Section 4 says, in reference to
NFSv2 and NFSv3 WRITE and SYMLINK operations: NFSv2 and NFSv3 WRITE and SYMLINK operations:
. . . a single RDMA Read list entry MAY be posted by the client to . . . a single RDMA Read list entry MAY be posted by the client to
supply the opaque file data for a WRITE request or the pathname supply the opaque file data for a WRITE request or the pathname
for a SYMLINK request. The server MUST ignore any Read list for for a SYMLINK request. The server MUST ignore any Read list for
other NFS procedures, as well as additional Read list entries other NFS procedures, as well as additional Read list entries
beyond the first in the list. beyond the first in the list.
skipping to change at page 21, line 26 skipping to change at page 22, line 15
However, there is a class of RPC operations where RDMA_NOMSG with However, there is a class of RPC operations where RDMA_NOMSG with
multiple read chunks is useful: when the body of an RPC call message multiple read chunks is useful: when the body of an RPC call message
is larger than the inline buffer size, even after RDMA-eligible is larger than the inline buffer size, even after RDMA-eligible
argument data has been moved to read chunks. argument data has been moved to read chunks.
A similar discussion applies to RDMA_NOMSG replies with large reply A similar discussion applies to RDMA_NOMSG replies with large reply
bodies and RDMA-eligible result data. Such replies would use both bodies and RDMA-eligible result data. Such replies would use both
the Write list and the Reply chunk simultaneously. However, write the Write list and the Reply chunk simultaneously. However, write
chunks do not have Position fields. chunks do not have Position fields.
3.4.1. Recommendations 3.5.1. Recommendations
RFC 5666bis should continue to allow RDMA_NOMSG type calls with RFC 5666bis should continue to allow RDMA_NOMSG type calls with
additional read chunks. The rules about RDMA-eligibility in RFC additional read chunks. The rules about RDMA-eligibility in RFC
5666bis should discuss when the use of this construction is 5666bis should discuss when the use of this construction is
beneficial, and when it should be avoided. beneficial, and when it should be avoided.
Authors of Upper Layer Bindings should be warned about ignoring these Authors of Upper Layer Bindings should be warned about ignoring these
cases. RPC 5666bis should provide a default behavior that applies cases. RPC 5666bis should provide a default behavior that applies
when Upper Layer Bindings omit this discussion. when Upper Layer Bindings omit this discussion.
3.5. RDMA_MSG Call with Position Zero Read Chunk 3.6. RDMA_MSG Call with Position Zero Read Chunk
The first item in the header of both RPC calls and RPC replies is the The first item in the header of both RPC calls and RPC replies is the
XID field [RFC5531]. RFC 5666 Section 4.1 says: XID field [RFC5531]. RFC 5666 Section 4.1 says:
A header of message type RDMA_MSG or RDMA_MSGP MUST be followed by A header of message type RDMA_MSG or RDMA_MSGP MUST be followed by
the RPC call or RPC reply message body, beginning with the XID. the RPC call or RPC reply message body, beginning with the XID.
This is a strong implication that the RPC header in an RDMA_MSG type This is a strong implication that the RPC header in an RDMA_MSG type
message starts at XDR position zero. Assume for a moment that, by message starts at XDR position zero. Assume for a moment that, by
definition, the RPC header in an RPC-over-RDMA XDR stream starts at definition, the RPC header in an RPC-over-RDMA XDR stream starts at
skipping to change at page 22, line 19 skipping to change at page 23, line 7
just like RPC header does. In an RDMA_NOMSG type call message, which just like RPC header does. In an RDMA_NOMSG type call message, which
does not include an RPC header, a Position Zero read chunk conveys does not include an RPC header, a Position Zero read chunk conveys
the RPC header. the RPC header.
There is no prohibition in RFC 5666 against an RDMA_MSG type call There is no prohibition in RFC 5666 against an RDMA_MSG type call
messsage with a Position Zero read chunk. However, it's not clear messsage with a Position Zero read chunk. However, it's not clear
how a responder should interpret such a message. RFC 5666 requires how a responder should interpret such a message. RFC 5666 requires
the RPC header to start at XDR position zero, but there is a Position the RPC header to start at XDR position zero, but there is a Position
Zero read chunk, which also starts at XDR position zero. Zero read chunk, which also starts at XDR position zero.
3.5.1. Recommendations 3.6.1. Recommendations
RPC 5666bis should clearly define what is meant by an XDR stream. RPC 5666bis should clearly define what is meant by an XDR stream.
RFC 5666bis should state that the value in the xdr_read_chunk RFC 5666bis should state that the value in the xdr_read_chunk
"position" field is measured relative to the start of the RPC header, "position" field is measured relative to the start of the RPC header,
which is the first byte of the header's XID field. which is the first byte of the header's XID field.
RFC 5666bis should prohibit requesters from providing a Position Zero RFC 5666bis should prohibit requesters from providing a Position Zero
read chunk in RDMA_MSG type calls. Likewise, RFC 5666bis should read chunk in RDMA_MSG type calls. Likewise, RFC 5666bis should
prohibit responders from utilizing a Reply chunk in RDMA_MSG type prohibit responders from utilizing a Reply chunk in RDMA_MSG type
replies. replies.
The diagrams in RFC 5666 Section 3.8 which number chunks starting The diagrams in RFC 5666 Section 3.8 which number chunks starting
with 1 should be revised. Readers confuse this number with an XDR with 1 should be revised. Readers confuse this number with an XDR
position. position.
3.6. Padding Inline Content After A Chunk 3.7. Padding Inline Content After A Chunk
To help clarify the discussion in this section, the term "read chunk" To help clarify the discussion in this section, the term "read chunk"
here always means the new definition where one or more read segments here always means the new definition where one or more read segments
that have identical values in their Position fields represents that have identical values in their Position fields represents
exactly one RDMA-eligible XDR object. exactly one RDMA-eligible XDR object.
A read chunk conveys a large argument payload via one or more RDMA A read chunk conveys a large argument payload via one or more RDMA
transfers. For instance, the data payload of an NFS WRITE operation transfers. For instance, the data payload of an NFS WRITE operation
may be be transferred using a read chunk [RFC5667]. may be be transferred using a read chunk [RFC5667].
skipping to change at page 24, line 12 skipping to change at page 25, line 5
any requirements for XDR padding and alignment when a read chunk is any requirements for XDR padding and alignment when a read chunk is
followed in the XDR stream by more inline content. followed in the XDR stream by more inline content.
Applying the rules of XDR, the XDR pad for the read chunk must not Applying the rules of XDR, the XDR pad for the read chunk must not
appear in the inline content, even if it was also not included in the appear in the inline content, even if it was also not included in the
chunk itself. This is because the inline content that preceded the chunk itself. This is because the inline content that preceded the
read chunk will have been padded to 4-byte alignment. The next read chunk will have been padded to 4-byte alignment. The next
position in the inline buffer is already on a 4-byte boundary, thus position in the inline buffer is already on a 4-byte boundary, thus
no padding is necessary. no padding is necessary.
3.6.1. Recommendations 3.7.1. Recommendations
State the above requirement in RFC 5666bis in its equivalent of RFC State the above requirement in RFC 5666bis in its equivalent of RFC
5666 Section 3.7. When a responder forms a reply, the same 5666 Section 3.7. When a responder forms a reply, the same
restriction applies to inline content interleaved with write chunks. restriction applies to inline content interleaved with write chunks.
Because all XDR objects must start on an XDR alignment boundary, all Because all XDR objects must start on an XDR alignment boundary, all
read and write chunks and all inline XDR objects in any XDR stream read and write chunks and all inline XDR objects in any XDR stream
must start on an XDR alignment boundary. This has implications for must start on an XDR alignment boundary. This has implications for
the values allowed in read chunk Position fields, for how XDR roundup the values allowed in read chunk Position fields, for how XDR roundup
works for chunks, and for how XDR objects are placed in inline works for chunks, and for how XDR objects are placed in inline
buffers. XDR alignment in inline buffers is always relative to buffers. XDR alignment in inline buffers is always relative to
Position Zero (or, where the RPC header starts). Position Zero (or, where the RPC header starts).
3.7. Write Chunk XDR Roundup 3.8. Write Chunk XDR Roundup
The final paragraph of RFC 5666 Section 3.7 says: The final paragraph of RFC 5666 Section 3.7 says:
For RDMA Write Chunks, a simpler encoding method applies. Again, For RDMA Write Chunks, a simpler encoding method applies. Again,
roundup bytes are not transferred, instead the chunk length sent roundup bytes are not transferred, instead the chunk length sent
to the receiver in the reply is simply increased to include any to the receiver in the reply is simply increased to include any
roundup. roundup.
A responder should avoid writing XDR pad bytes, as the requester's A responder should avoid writing XDR pad bytes, as the requester's
upper layer does not reference them, though the language does not upper layer does not reference them, though the language does not
skipping to change at page 26, line 5 skipping to change at page 26, line 46
These implementations may not be 100% interoperable. The language of These implementations may not be 100% interoperable. The language of
Section 3.7 of [RFC5666] appears to allow all of this behavior (in Section 3.7 of [RFC5666] appears to allow all of this behavior (in
particular, it does not prohibit a responder from writing the XDR pad particular, it does not prohibit a responder from writing the XDR pad
using RFC2119-style keywords, and does not require that requesters using RFC2119-style keywords, and does not require that requesters
register the extra space to accommodate the XDR pad). register the extra space to accommodate the XDR pad).
Note that because the Reply chunk is a write chunk, these roundup Note that because the Reply chunk is a write chunk, these roundup
rules also apply to it. rules also apply to it.
3.7.1. Recommendations 3.8.1. Recommendations
The current specification allows XDR pad bytes to leak into user The current specification allows XDR pad bytes to leak into user
buffers, and none of the current implementations prevent this leak. buffers, and none of the current implementations prevent this leak.
There may be room to adjust the protocol specification independently There may be room to adjust the protocol specification independently
of current implementation behavior. of current implementation behavior.
RFC 5666bis should explicitly discuss the requirements around write RFC 5666bis should explicitly discuss the requirements around write
chunk roundup separately from the discussion of read chunk roundup. chunk roundup separately from the discussion of read chunk roundup.
Explicit RFC2119-style interoperability requirements should be Explicit RFC2119-style interoperability requirements should be
provided for write chunks. Responders MUST NOT write XDR pad bytes provided for write chunks. Responders MUST NOT write XDR pad bytes
at the end of a Write chunk. at the end of a Write chunk.
Allocating and registering extra space for XDR pad bytes that are Allocating and registering extra space for XDR pad bytes that are
never written is wasteful. RFC 5666bis should forbid it. Responders never written is wasteful. RFC 5666bis should forbid it. Responders
should not expect requesters to provide space for XDR pad bytes. should not expect requesters to provide space for XDR pad bytes.
3.8. Write List Error Cases 3.9. Write List Error Cases
RFC 5666 Section 3.6 says: RFC 5666 Section 3.6 says:
When a write chunk list is provided for the results of the RPC When a write chunk list is provided for the results of the RPC
call, the RPC server MUST provide any corresponding data via RDMA call, the RPC server MUST provide any corresponding data via RDMA
Write to the memory referenced in the chunk list entries. Write to the memory referenced in the chunk list entries.
This requires the responder to use the Write list when it is This requires the responder to use the Write list when it is
provided. Another way to say it is a responder is not permitted to provided. Another way to say it is a responder is not permitted to
return bulk data inline or in the Reply chunk when the requester has return bulk data inline or in the Reply chunk when the requester has
skipping to change at page 29, line 11 skipping to change at page 30, line 11
the reply. It should be the responsibility of the Upper Layer the reply. It should be the responsibility of the Upper Layer
Binding to avoid ambiguous situations by appropriately restricting Binding to avoid ambiguous situations by appropriately restricting
RDMA-eligible data items. RDMA-eligible data items.
Remember that a responder MUST use the Write list if the requester Remember that a responder MUST use the Write list if the requester
provided it and the responder has RDMA-eligible result data. If the provided it and the responder has RDMA-eligible result data. If the
requester has not provided enough Write chunks in the Write list, the requester has not provided enough Write chunks in the Write list, the
responder may have to use a long message as well, depending on the responder may have to use a long message as well, depending on the
remaining size of the RPC reply. remaining size of the RPC reply.
3.8.1. Recommendations 3.9.1. Recommendations
RFC 5666bis should explicitly discuss responder behavior when an RPC RFC 5666bis should explicitly discuss responder behavior when an RPC
reply does not need to use a Write list entry provided by a reply does not need to use a Write list entry provided by a
requester. This is generic behavior, independent of any Upper Layer requester. This is generic behavior, independent of any Upper Layer
Binding. The explanation can be partially or wholly copied from RFC Binding. The explanation can be partially or wholly copied from RFC
5667 Section 5's discussion of NFSv4 COMPOUND. 5667 Section 5's discussion of NFSv4 COMPOUND.
A number of places in RFC 5666 Section 3.6 hint at how a responder A number of places in RFC 5666 Section 3.6 hint at how a responder
behaves when it is to return data that does not use every byte of behaves when it is to return data that does not use every byte of
every provided Write chunk segment. RFC 5666bis should state every provided Write chunk segment. RFC 5666bis should state
skipping to change at page 32, line 44 skipping to change at page 33, line 44
where the sum of posted and in-process receive buffers is less than where the sum of posted and in-process receive buffers is less than
its advertised credit limit. In either case, such a window could its advertised credit limit. In either case, such a window could
result in lost messages or be catastrophic for the transport result in lost messages or be catastrophic for the transport
connection. connection.
4.5.1. Recommendations 4.5.1. Recommendations
Clarify or remove the dependent clause in the section in RFC 5666bis Clarify or remove the dependent clause in the section in RFC 5666bis
that is equivalent to RFC 5666 Section 3.3. that is equivalent to RFC 5666 Section 3.3.
4.6. Detection Of Unsupported Protocol Versions
Section 4.2 of [RFC5666] is explicit about how a responder must
handle RPC-over-RDMA messages that carry an unrecognized RPC-over-
RDMA protocol version:
When a peer receives an RPC RDMA message, it MUST perform the
following basic validity checks on the header and chunk contents.
If such errors are detected in the request, an RDMA_ERROR reply
MUST be generated.
When the peer detects an RPC-over-RDMA header version that it does
not support (currently this document defines only version 1), it
replies with an error code of ERR_VERS, and provides the low and
high inclusive version numbers it does, in fact, support. The
version number in this reply MUST be any value otherwise valid at
the receiver.
However, one widely deployed RPC-over-RDMA Version One server
implementation is known to discard requests that do not contain the
value one (1) in their rdma_vers field. This server implementation
does not reply with RDMA_ERROR / RDMA_ERR_VERS in this case.
Without a proper protocol version detection mechanism, it is not
possible for RPC-over-RDMA Version One implementations to
interoperate with implementations that support newer protocol
versions.
4.6.1. Recommendations
RPC-over-RDMA Version One implementations that discard non-Version
One requests without an error response are considered non-compliant
with [RFC5666]. No changes to the specification are needed.
5. Pre-requisites For NFSv4 5. Pre-requisites For NFSv4
5.1. Bi-directional Operation 5.1. Bi-directional Operation
NFSv4.1 moves the backchannel onto the same transport as forward NFSv4.1 moves the backchannel onto the same transport as forward
requests [RFC5661]. Typically RPC client endpoints do not expect to requests [RFC5661]. Typically RPC client endpoints do not expect to
receive RPC call messages. To support NFSv4.1 callback operations, receive RPC call messages. To support NFSv4.1 callback operations,
client and server implementations must be updated to support bi- client and server implementations must be updated to support bi-
directional operation. directional operation.
skipping to change at page 38, line 41 skipping to change at page 40, line 19
7.1.2. Read-Read Transfer Model 7.1.2. Read-Read Transfer Model
All existing RPC-over-RDMA Version One implementations use a Read- All existing RPC-over-RDMA Version One implementations use a Read-
Write data transfer model. The server endpoint is responsible for Write data transfer model. The server endpoint is responsible for
initiating all RDMA data transfers. The Read-Read transfer model has initiating all RDMA data transfers. The Read-Read transfer model has
been deprecated, but because it appears in RFC 5666, implementations been deprecated, but because it appears in RFC 5666, implementations
are still responsible for supporting it. By removing the are still responsible for supporting it. By removing the
specification and discussion of Read-Read, the protocol and specification and discussion of Read-Read, the protocol and
specification can be made simpler and more clear. specification can be made simpler and more clear.
Once the Read-Read transfer model is no longer supported, a responder
would no longer be allowed to send a Read list to a requester.
Sending a Read list would be needed if a requester has not provided
enough memory space in the form of a Reply chunk or Write list to
receive a large RPC Reply.
There is currently no mechanism in the RPC-over-RDMA Version One
protocol for a responder to indicate that inadequate reply buffer
resources were provided by a requester. Therefore, requesters should
be fully responsible for providing all necessary memory resources to
receive each RPC reply, including a properly populated Write list
and/or a Reply chunk.
7.1.2.1. Recommendations 7.1.2.1. Recommendations
Remove Read-Read from RFC 5666bis, in particular from its equivalent Remove Read-Read from RFC 5666bis, in particular from its equivalent
of RFC 5666 Section 3.8. RFC 5666bis should require implementations of RFC 5666 Section 3.8. RFC 5666bis should require implementations
not to send RDMA_DONE; an implementation receiving it should ignore not to send RDMA_DONE; an implementation receiving it should ignore
it. The XDR definition should reserve RDMA_DONE. it. The XDR definition should reserve RDMA_DONE. RFC 5666bis should
explicitly state requirements for requesters to allocate and prepare
reply buffer resources for each RPC-over-RDMA message.
7.1.3. RDMA_MSGP 7.1.3. RDMA_MSGP
It has been observed that the current specification of RDMA_MSGP is It has been observed that the current specification of RDMA_MSGP is
not clear enough to result in interoperable implementations. not clear enough to result in interoperable implementations.
Possibly as a result, current receive endpoints do recognize and Possibly as a result, current receive endpoints do recognize and
process RDMA_MSGP messages, though they do not take advantage of the process RDMA_MSGP messages, though they do not take advantage of the
passed alignment parameters. Receivers treat RDMA_MSGP messages like passed alignment parameters. Receivers treat RDMA_MSGP messages like
RDMA_MSG messages. RDMA_MSG messages.
 End of changes. 25 change blocks. 
65 lines changed or deleted 154 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/