draft-ietf-nfsv4-rpcrdma-bidirection-03.txt   draft-ietf-nfsv4-rpcrdma-bidirection-04.txt 
Network File System Version 4 C. Lever Network File System Version 4 C. Lever
Internet-Draft Oracle Internet-Draft Oracle
Intended status: Standards Track May 2, 2016 Intended status: Standards Track May 27, 2016
Expires: November 3, 2016 Expires: November 28, 2016
Bi-directional Remote Procedure Call On RPC-over-RDMA Transports Bi-directional Remote Procedure Call On RPC-over-RDMA Transports
draft-ietf-nfsv4-rpcrdma-bidirection-03 draft-ietf-nfsv4-rpcrdma-bidirection-04
Abstract Abstract
Recent minor versions of NFSv4 work best when ONC RPC transports can Minor versions of NFSv4 newer than NFSv4.0 work best when ONC RPC
send Remote Procedure Call transactions in both directions on the transports can send Remote Procedure Call transactions in both
same connection. This document describes how RPC-over-RDMA transport directions on the same connection. This document describes how RPC-
endpoints convey RPCs in both directions on a single connection. over-RDMA transport endpoints convey RPCs in both directions on a
single connection.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 3, 2016. This Internet-Draft will expire on November 28, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 14 skipping to change at page 2, line 14
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. Understanding RPC Direction . . . . . . . . . . . . . . . . . 3 2. Understanding RPC Direction . . . . . . . . . . . . . . . . . 3
2.1. Forward Direction . . . . . . . . . . . . . . . . . . . . 3 2.1. Forward Direction . . . . . . . . . . . . . . . . . . . . 3
2.2. Backward Direction . . . . . . . . . . . . . . . . . . . 4 2.2. Backward Direction . . . . . . . . . . . . . . . . . . . 4
2.3. Bi-directional Operation . . . . . . . . . . . . . . . . 4 2.3. Bi-directional Operation . . . . . . . . . . . . . . . . 4
2.4. XID Values . . . . . . . . . . . . . . . . . . . . . . . 4 2.4. XID Values . . . . . . . . . . . . . . . . . . . . . . . 4
3. Rationale For Bi-Directional RPC-over-RDMA . . . . . . . . . 5 3. Immediate Uses Of Bi-Directional RPC-over-RDMA . . . . . . . 5
3.1. NFSv4.0 Callback Operation . . . . . . . . . . . . . . . 5 3.1. NFSv4.0 Callback Operation . . . . . . . . . . . . . . . 5
3.2. NFSv4.1 Callback Operation . . . . . . . . . . . . . . . 6 3.2. NFSv4.1 Callback Operation . . . . . . . . . . . . . . . 6
4. Flow Control . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Flow Control . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1. Backward Credits . . . . . . . . . . . . . . . . . . . . 7 4.1. Backward Credits . . . . . . . . . . . . . . . . . . . . 7
4.2. Managing Receive Buffers . . . . . . . . . . . . . . . . 7 4.2. Inline Thresholds . . . . . . . . . . . . . . . . . . . . 7
5. Protocol For Backward Operation . . . . . . . . . . . . . . . 8 4.3. Managing Receive Buffers . . . . . . . . . . . . . . . . 7
5.1. Sending A Backward Direction Call . . . . . . . . . . . . 8 5. Sending And Receiving Backward Operations . . . . . . . . . . 8
5.1. Sending A Backward Direction Call . . . . . . . . . . . . 9
5.2. Sending A Backward Direction Reply . . . . . . . . . . . 9 5.2. Sending A Backward Direction Reply . . . . . . . . . . . 9
5.3. Backward Direction Chunks . . . . . . . . . . . . . . . . 9 5.3. Backward Direction Chunks . . . . . . . . . . . . . . . . 9
5.4. Backward Direction Retransmission . . . . . . . . . . . . 10 5.4. Backward Direction Retransmission . . . . . . . . . . . . 10
6. In the Absence of Backward Direction Support . . . . . . . . 10 6. In the Absence of Backward Direction Support . . . . . . . . 11
7. Backward Direction Upper Layer Binding . . . . . . . . . . . 11 7. Considerations For Upper Layer Bindings . . . . . . . . . . . 11
8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12
11. Normative References . . . . . . . . . . . . . . . . . . . . 12 11. Normative References . . . . . . . . . . . . . . . . . . . . 12
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
The purpose of this document is to enable bi-directional RPC The purpose of this document is to enable concurrent operation in
operation on RPC-over-RDMA protocol versions that do not have both directions on a single transport connection using RPC-over-RDMA
specific protocol facilities for backward direction operation. protocol versions that do not have specific facilities for backward
Backward direction RPC transactions enable the operation of NFSv4.1, direction operation.
and in particular pNFS.
For example, using the protocol described in this document, RPC Backward direction RPC transactions are necessary for the operation
transactions can be conveyed in both directions on the same RPC-over- of NFSv4.1, and in particular, of pNFS, though any Upper Layer
RDMA Version One connection without changes to the Version One header Protocol implementation may make use of them. An Upper Layer Binding
XDR description. Therefore this document does not update for NFSv4.x callback operation is additionally required (see
[I-D.ietf-nfsv4-rfc5666bis]. Section 7), but is not provided in this document.
Providing an Upper Layer Binding for NFSv4.x callback operations is For example, using the approach described herein, RPC transactions
outside the scope of this document. can be conveyed in both directions on the same RPC-over-RDMA Version
One connection without changes to the the XDR description of RPC-
over-RDMA Version One. This document does not modify the XDR or
protocol described in [I-D.ietf-nfsv4-rfc5666bis]. Future versions
of RPC-over-RDMA may adopt the approach described herein, or may
replace it with a different approach.
1.1. Requirements Language 1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
2. Understanding RPC Direction 2. Understanding RPC Direction
The ONC RPC protocol as described in [RFC5531] is fundamentally a The ONC RPC protocol as described in [RFC5531] is architected as a
message-passing protocol between one server and one or more clients. message-passing protocol between one server and one or more clients.
ONC RPC transactions are made up of two types of messages. ONC RPC transactions are made up of two types of messages.
A CALL message, or "Call", requests work. A Call is designated by A CALL message, or "Call", requests work. A Call is designated by
the value CALL in the message's msg_type field. An arbitrary unique the value CALL in the message's msg_type field. An arbitrary unique
value is placed in the message's xid field. A host that originates a value is placed in the message's xid field. A host that originates a
Call is referred to in this document as a "Requester." Call is referred to in this document as a "Requester."
A REPLY message, or "Reply", reports the results of work requested by A REPLY message, or "Reply", reports the results of work requested by
a Call. A Reply is designated by the value REPLY in the message's a Call. A Reply is designated by the value REPLY in the message's
msg_type field. The value contained in the message's xid field is msg_type field. The value contained in the message's xid field is
copied from the Call whose results are being returned. A host that copied from the Call whose results are being returned. A host that
emits a Reply is referred to as a "Responder." emits a Reply is referred to as a "Responder."
Typically, a Call generates a corresponding Reply. A Reply is never Typically, a Call results in a corresponding Reply. A Reply is never
sent without a corresponding Call. sent without a corresponding Call.
RPC-over-RDMA is a connection-oriented RPC transport. When a RPC-over-RDMA is a connection-oriented RPC transport. In all cases,
connection-oriented transport is used, ONC RPC client endpoints are when a connection-oriented transport is used, ONC RPC client
responsible for initiating transport connections, while ONC RPC endpoints are responsible for initiating transport connections, while
service endpoints wait passively for incoming connection requests. ONC RPC service endpoints passively await incoming connection
requests.
RPC direction on connectionless RPC transports is not considered in RPC direction on connectionless RPC transports is not addressed in
this document. this document.
2.1. Forward Direction 2.1. Forward Direction
A traditional ONC RPC client is always a Requester. A traditional Traditionally, an ONC RPC client acts as a Requester, while an ONC
ONC RPC service is always a Responder. This traditional form of ONC RPC service acts as a Responder. This form of message passing is
RPC message passing is referred to as operation in the "forward referred to as "forward direction" operation.
direction."
During forward direction operation, the ONC RPC client is responsible
for establishing transport connections.
2.2. Backward Direction 2.2. Backward Direction
The ONC RPC specification [RFC5531] does not forbid passing messages The ONC RPC specification [RFC5531] does not forbid passing messages
in the other direction. An ONC RPC service endpoint can act as a in the other direction. An ONC RPC service endpoint can act as a
Requester, in which case an ONC RPC client endpoint acts as a Requester, in which case an ONC RPC client endpoint acts as a
Responder. This form of message passing is referred to as operation Responder. This form of message passing is referred to as "backward
in the "backward direction." direction" operation.
During backward direction operation, the ONC RPC client is During backward direction operation, the ONC RPC client is
responsible for establishing transport connections, even though ONC responsible for establishing transport connections, even though ONC
RPC Calls come from the ONC RPC server. RPC Calls come from the ONC RPC server.
ONC RPC clients and services are optimized to perform and scale well ONC RPC clients and services are optimized to perform and scale well
while handling traffic in the forward direction, and may not be while handling traffic in the forward direction, and may not be
prepared to handle operation in the backward direction. Not until prepared to handle operation in the backward direction. Not until
recently has there been a need to handle backward direction NFSv4.1 [RFC5661] has there been a strong need to handle backward
operation. direction operation.
2.3. Bi-directional Operation 2.3. Bi-directional Operation
A pair of connected RPC endpoints may choose to use only forward or A pair of connected RPC endpoints may choose to use only forward or
only backward direction operations on a particular transport. Or, only backward direction operations on a particular transport. Or,
these endpoints may send Calls in both directions concurrently on the these endpoints may send Calls in both directions concurrently on the
same transport. same transport.
"Bi-directional operation" occurs when both transport endpoints act "Bi-directional operation" occurs when both transport endpoints act
as a Requester and a Responder at the same time. As above, the ONC as a Requester and a Responder at the same time.
RPC client is always responsible for establishing transport
connections. Bi-directionality is an extension of RPC transport connection
sharing. Two RPC endpoints wish to exchange independent RPC messages
over a shared connection, but in opposite directions. These messages
may or may not be related to the same workloads or RPC Programs.
2.4. XID Values 2.4. XID Values
Section 9 of [RFC5531] introduces the ONC RPC transaction identifier, Section 9 of [RFC5531] introduces the ONC RPC transaction identifier,
or "xid" for short. The value of an xid is interpreted in the or "xid" for short. The value of an xid is interpreted in the
context of the message's msg_type field. context of the message's msg_type field.
o The xid of a Call is arbitrary but is unique among outstanding o The xid of a Call is arbitrary but is unique among outstanding
Calls from that Requester. Calls from that Requester.
skipping to change at page 5, line 17 skipping to change at page 5, line 17
During bi-directional operation, forward and backward direction XIDs During bi-directional operation, forward and backward direction XIDs
are typically generated on distinct hosts by possibly different are typically generated on distinct hosts by possibly different
algorithms. There is no co-ordination between forward and backward algorithms. There is no co-ordination between forward and backward
direction XID generation. direction XID generation.
Therefore, a forward direction Requester MAY use the same xid value Therefore, a forward direction Requester MAY use the same xid value
at the same time as a backward direction Requester on the same at the same time as a backward direction Requester on the same
transport connection. Though such concurrent requests use the same transport connection. Though such concurrent requests use the same
xid value, they represent distinct ONC RPC transactions. xid value, they represent distinct ONC RPC transactions.
3. Rationale For Bi-Directional RPC-over-RDMA 3. Immediate Uses Of Bi-Directional RPC-over-RDMA
3.1. NFSv4.0 Callback Operation 3.1. NFSv4.0 Callback Operation
An NFSv4.0 client employs a traditional ONC RPC client to send NFS An NFSv4.0 client employs a traditional ONC RPC client to send NFS
requests to an NFSv4.0 server's traditional ONC RPC service requests to an NFSv4.0 server's traditional ONC RPC service
[RFC7530]. NFSv4.0 requests flow in the forward direction on a [RFC7530]. NFSv4.0 requests flow in the forward direction on a
connection established by the client. This connection is referred to connection established by the client. This connection is referred to
as a "forechannel" connection. as a "forechannel" connection.
An NFSv4 "delegation" is simply a promise made by a server that it An NFSv4 "delegation" is simply a promise made by a server that it
will notify a client before another agent is allowed access to a will notify a client before another client or program running on the
file. With this guarantee, that client can operate as sole accessor server is allowed access to a file. With this guarantee, that client
of the file. In particular, it can manage the file's data and can operate as sole accessor of the file. In particular, it can
metadata caches aggressively. manage the file's data and metadata caches aggressively.
To administer file delegations, NFSv4.0 introduces the use of To administer file delegations, NFSv4.0 introduces the use of
callback operations, or "callbacks", in Section 10.2 of [RFC7530]. callback operations, or "callbacks", in Section 10.2 of [RFC7530].
An NFSv4.0 server sets up a traditional ONC RPC client, and an An NFSv4.0 server sets up a traditional ONC RPC client, and an
NFSv4.0 client sets up a traditional ONC RPC service. Callbacks flow NFSv4.0 client sets up a traditional ONC RPC service. Callbacks flow
in the forward direction on a connection established between the in the forward direction on a connection established between the
server's callback client, and the client's callback server. This server's callback client, and the client's callback server. This
connection is distinct from connections being used as forechannels, connection is distinct from connections being used as forechannels,
and is referred to as a "backchannel connection." and is referred to as a "backchannel connection."
When an RDMA transport is used as a forechannel, an NFSv4.0 client When an RDMA transport is used as a forechannel, an NFSv4.0 client
typically provides a TCP callback service. The client's SETCLIENTID typically provides a TCP callback service. The client's SETCLIENTID
operation advertises the callback service endpoint with a "tcp" or operation advertises the callback service endpoint with a "tcp" or
"tcp6" netid. The server then connects to this service using a TCP "tcp6" netid. The server then connects to this service using a TCP
socket. socket.
NFSv4.0 implementations are fully functional without a backchannel in NFSv4.0 implementations can function without a backchannel in place.
place. In this case, the server does not grant file delegations. In this case, the server does not grant file delegations. This might
This might result in a negative performance effect, but functional result in a negative performance effect, but correctness is not
correctness is unaffected. affected.
3.2. NFSv4.1 Callback Operation 3.2. NFSv4.1 Callback Operation
NFSv4.1 supports file delegation in a similar fashion to NFSv4.0, and NFSv4.1 supports file delegation in a similar fashion to NFSv4.0, and
extends the callback mechanism to manage pNFS layouts, as discussed extends the callback mechanism to manage pNFS layouts, as discussed
in Section 12 of [RFC5661]. in Section 12 of [RFC5661].
To facilitate operation through NAT routers, all NFSv4.1 transport To facilitate operation through NAT routers, all NFSv4.1 transport
connections are initiated by NFSv4.1 clients. Therefore NFSv4.1 connections are initiated by NFSv4.1 clients. Therefore NFSv4.1
servers send callbacks to clients in the backward direction on servers send callbacks to clients in the backward direction on
skipping to change at page 6, line 31 skipping to change at page 6, line 31
NFSv4.1 clients may establish distinct transport connections for NFSv4.1 clients may establish distinct transport connections for
forechannel and backchannel operation, or they may combine forechannel and backchannel operation, or they may combine
forechannel and backchannel operation on one transport connection forechannel and backchannel operation on one transport connection
using bi-directional operation. using bi-directional operation.
Without a backward direction RPC-over-RDMA capability, an NFSv4.1 Without a backward direction RPC-over-RDMA capability, an NFSv4.1
client must additionally connect using a transport with backward client must additionally connect using a transport with backward
direction capability to use as a backchannel. TCP is the only choice direction capability to use as a backchannel. TCP is the only choice
for an NFSv4.1 backchannel connection in this case. for an NFSv4.1 backchannel connection in this case.
Some implementations find it more convenient to use a single combined Implementations often find it more convenient to use a single
transport (ie. a transport that is capable of bi-directional combined transport (ie. a transport that is capable of bi-directional
operation). This simplifies connection establishment and recovery operation). This simplifies connection establishment and recovery
during network partitions or when one endpoint restarts. during network partitions or when one endpoint restarts. This can
also enable better scaling by using fewer transport connections to
perform the same work.
As with NFSv4.0, if a backchannel is not in use, an NFSv4.1 server As with NFSv4.0, if a backchannel is not in use, an NFSv4.1 server
does not grant delegations. But because of its reliance on callbacks does not grant delegations. Because NFSv4.1 relies on callbacks to
to manage pNFS layout state, pNFS operation is not possible without a manage pNFS layout state, pNFS operation is not possible without a
backchannel. backchannel.
4. Flow Control 4. Flow Control
For an RDMA Send operation to work, the receiving peer must have For an RDMA Send operation to work properly, the receiving peer must
posted an RDMA Receive Work Request (WR) to provide a receive buffer have posted a receive buffer in which to accept the incoming message.
in which to land the incoming message. If a receiver hasn't posted If a receiver hasn't posted enough buffers to accommodate each
enough Receive WRs to land incoming Send operations, the RDMA incoming Send operation, the receiving RDMA provider is allowed to
provider is allowed to drop the RDMA connection. terminate the RDMA connection.
RPC-over-RDMA transport protocols provide built-in send flow control RPC-over-RDMA transport protocols provide built-in send flow control
to prevent overrunning the number of pre-posted receive buffers on a to prevent overrunning the number of pre-posted receive buffers on a
connection's receive endpoint. This is fully discussed in connection's receive endpoint. For RPC-over-RDMA Version One, this
Section 4.3 of [I-D.ietf-nfsv4-rfc5666bis]. is discussed in Section 4.3 of [I-D.ietf-nfsv4-rfc5666bis].
4.1. Backward Credits 4.1. Backward Credits
Credits work the same way in the backward direction as they do in the Credits work the same way in the backward direction as they do in the
forward direction. However, forward direction credits and backward forward direction. However, forward direction credits and backward
direction credits are accounted separately. direction credits are accounted separately.
In other words, the forward direction credit value is the same In other words, the forward direction credit value is the same
whether or not there are backward direction resources associated with whether or not there are backward direction resources associated with
an RPC-over-RDMA transport connection. The backward direction credit an RPC-over-RDMA transport connection. The backward direction credit
skipping to change at page 7, line 31 skipping to change at page 7, line 33
granted. This is the number of backward direction Calls the granted. This is the number of backward direction Calls the
Responder is prepared to handle at once. Responder is prepared to handle at once.
When message direction is not fully determined by context or by an When message direction is not fully determined by context or by an
accompanying RPC message with a call direction field, it is not accompanying RPC message with a call direction field, it is not
possible to tell whether the header credit value is a request or possible to tell whether the header credit value is a request or
grant, or whether the value applies to the forward direction or grant, or whether the value applies to the forward direction or
backward direction. In such cases, the receiver MUST NOT use the backward direction. In such cases, the receiver MUST NOT use the
header's credit value. header's credit value.
4.2. Managing Receive Buffers 4.2. Inline Thresholds
Forward and backward operation on the same connection share the same
receive buffers. Therefore the inline threshold values for the
forward direction and the backward direction are the same. The call
inline threshold for the backward direction is the same as the reply
inline threshold for the forward direction, and vice versa. For more
information, see Section 4.3.2 of [I-D.ietf-nfsv4-rfc5666bis].
4.3. Managing Receive Buffers
An RPC-over-RDMA transport endpoint must pre-post receive buffers An RPC-over-RDMA transport endpoint must pre-post receive buffers
before it can receive and process incoming RPC-over-RDMA messages. before it can receive and process incoming RPC-over-RDMA messages.
If a sender transmits a message for a receiver which has no prepared If a sender transmits a message for a receiver which has no posted
receive buffer, the RDMA provider is allowed to drop the RDMA receive buffer, the RDMA provider is allowed to drop the RDMA
connection. connection.
4.2.1. Client Receive Buffers 4.3.1. Client Receive Buffers
Typically an RPC-over-RDMA Requester posts only as many receive Typically an RPC-over-RDMA Requester posts only as many receive
buffers as there are outstanding RPC Calls. A client endpoint buffers as there are outstanding RPC Calls. A client endpoint
without backward direction support might therefore at times have no without backward direction support might therefore at times have no
pre-posted receive buffers. pre-posted receive buffers.
To receive incoming backward direction Calls, an RPC-over-RDMA client To receive incoming backward direction Calls, an RPC-over-RDMA client
endpoint must pre-post enough additional receive buffers to match its endpoint must pre-post enough additional receive buffers to match its
advertised backward direction credit value. Each outstanding forward advertised backward direction credit value. Each outstanding forward
direction RPC requires an additional receive buffer above this direction RPC requires an additional receive buffer above this
minimum. minimum.
When an RDMA transport connection is lost, all active receive buffers When an RDMA transport connection is lost, all active receive buffers
are flushed and are no longer available to receive incoming messages. are flushed and are no longer available to receive incoming messages.
When a fresh transport connection is established, a client endpoint When a fresh transport connection is established, a client endpoint
must re-post a receive buffer to handle the Reply for each must re-post a receive buffer to handle the Reply for each
retransmitted forward direction Call, and a full set of receive retransmitted forward direction Call, and a full set of receive
buffers to handle backward direction Calls. buffers to handle backward direction Calls.
4.2.2. Server Receive Buffers 4.3.2. Server Receive Buffers
A forward direction RPC-over-RDMA service endpoint posts as many A forward direction RPC-over-RDMA service endpoint posts as many
receive buffers as it expects incoming forward direction Calls. That receive buffers as it expects incoming forward direction Calls. That
is, it posts no fewer buffers than the number of credits granted in is, it posts no fewer buffers than the number of credits granted in
the rdma_credit field of forward direction RPC replies. the rdma_credit field of forward direction RPC replies.
To receive incoming backward direction replies, an RPC-over-RDMA To receive incoming backward direction replies, an RPC-over-RDMA
server endpoint must pre-post a receive buffer for each backward server endpoint must pre-post a receive buffer for each backward
direction Call it sends. direction Call it sends.
When the existing transport connection is lost, all active receive When the existing transport connection is lost, all active receive
buffers are flushed and are no longer available to receive incoming buffers are flushed and are no longer available to receive incoming
messages. When a fresh transport connection is established, a server messages. When a fresh transport connection is established, a server
endpoint must re-post a receive buffer to handle the Reply for each endpoint must re-post a receive buffer to handle the Reply for each
retransmitted backward direction Call, and a full set of receive retransmitted backward direction Call, and a full set of receive
buffers for receiving forward direction Calls. buffers for receiving forward direction Calls.
5. Protocol For Backward Operation 5. Sending And Receiving Backward Operations
Performing backward direction ONC RPC operations over an RPC-over- The operation of RPC-over-RDMA transports in the forward direction is
RDMA transport connection can be accomplished by observing the defined in [RFC5531] and [I-D.ietf-nfsv4-rfc5666bis]. In this
protocol described in the following subsections. For reference, the section, a mechanism for backward direction operation on RPC-over-
XDR description of RPC-over-RDMA Version One is contained in RDMA is defined. Backward operation used in combination with forward
Section 5.1 of [I-D.ietf-nfsv4-rfc5666bis]. operation enables bi-directional communication on a common RPC
transport connection.
Certain fields in the RPC-over-RDMA header are fixed for all versions
of RPC-over-RDMA. The XDR description of these fields is contained
in Section 5.1 of [I-D.ietf-nfsv4-rfc5666bis].
5.1. Sending A Backward Direction Call 5.1. Sending A Backward Direction Call
To form a backward direction RPC-over-RDMA Call message, an ONC RPC To form a backward direction RPC-over-RDMA Call message, an ONC RPC
service endpoint constructs an RPC-over-RDMA header containing a service endpoint constructs an RPC-over-RDMA header containing a
fresh RPC XID in the rdma_xid field (see Section 2.4 for full fresh RPC XID in the rdma_xid field (see Section 2.4 for full
requirements). requirements).
The rdma_vers field MUST contain the same value in backward and The rdma_vers field MUST contain the same value in backward and
forward direction Call messages on the same connection. forward direction Call messages on the same connection.
The number of requested backward direction credits is placed in the The number of requested backward direction credits is placed in the
rdma_credit field (see Section 4). rdma_credit field (see Section 4).
Whether presented inline or as a separate chunk, the ONC RPC Call Whether presented inline or as a separate chunk, the ONC RPC Call
header MUST start with the same XID value that is present in the RPC- header MUST start with the same XID value that is present in the RPC-
over-RDMA header, and the header's msg_type field MUST contain the over-RDMA header, and the RPC header's msg_type field MUST contain
value CALL. the value CALL.
5.2. Sending A Backward Direction Reply 5.2. Sending A Backward Direction Reply
To form a backward direction RPC-over-RDMA Reply message, an ONC RPC To form a backward direction RPC-over-RDMA Reply message, an ONC RPC
client endpoint constructs an RPC-over-RDMA header containing a copy client endpoint constructs an RPC-over-RDMA header containing a copy
of the matching ONC RPC Call's RPC XID in the rdma_xid field (see of the matching ONC RPC Call's RPC XID in the rdma_xid field (see
Section 2.4 for full requirements). Section 2.4 for full requirements).
The rdma_vers field MUST contain the same value in a backward The rdma_vers field MUST contain the same value in a backward
direction Reply message as in the matching Call message. direction Reply message as in the matching Call message.
The number of granted backward direction credits is placed in the The number of granted backward direction credits is placed in the
rdma_credit field (see Section 4). rdma_credit field (see Section 4).
Whether presented inline or as a separate chunk, the ONC RPC Reply Whether presented inline or as a separate chunk, the ONC RPC Reply
header MUST start with the same XID value that is present in the RPC- header MUST start with the same XID value that is present in the RPC-
over-RDMA header, and the header's msg_type field MUST contain the over-RDMA header, and the RPC header's msg_type field MUST contain
value REPLY. the value REPLY.
5.3. Backward Direction Chunks 5.3. Backward Direction Chunks
Chunks MAY be used in the backward direction. They operate the same Chunks MAY be used in the backward direction. They operate the same
way as in the forward direction (see [I-D.ietf-nfsv4-rfc5666bis] for way as in the forward direction (see [I-D.ietf-nfsv4-rfc5666bis] for
details). details).
An implementation might not support any Upper Layer Protocol that has An implementation might not support any Upper Layer Protocol that has
DDP-eligible data items. The Upper Layer Protocol may also use only DDP-eligible data items. The Upper Layer Protocol may also use only
small messages, or it may have a native mechanism for restricting the small messages, or it may have a native mechanism for restricting the
skipping to change at page 9, line 45 skipping to change at page 10, line 16
Long Messages in the backward direction. Long Messages in the backward direction.
When there is no Upper Layer Protocol requirement for chunks, When there is no Upper Layer Protocol requirement for chunks,
implementers can choose not to provide support for chunks in the implementers can choose not to provide support for chunks in the
backward direction. This avoids the complexity of adding support for backward direction. This avoids the complexity of adding support for
performing RDMA Reads and Writes in the backward direction. performing RDMA Reads and Writes in the backward direction.
When chunks are not implemented, RPC messages in the backward When chunks are not implemented, RPC messages in the backward
direction are always sent using RDMA_MSG, and therefore can be no direction are always sent using RDMA_MSG, and therefore can be no
larger than what can be sent inline (that is, without chunks). larger than what can be sent inline (that is, without chunks).
Sending an inline message larger than the receiver's inline threshold Sending an inline message larger than the inline threshold can result
can result in loss of connection. in loss of connection.
If a backward direction requester provides a non-empty chunk list to If a backward direction requester provides a non-empty chunk list to
a responder that does not support chunks, the responder MUST reply a responder that does not support chunks, the responder MUST reply
with an RDMA_ERROR message with rdma_err field set to ERR_CHUNK. with an RDMA_ERROR message with rdma_err field set to ERR_CHUNK.
5.4. Backward Direction Retransmission 5.4. Backward Direction Retransmission
In rare cases, an ONC RPC transaction cannot be completed within a In rare cases, an ONC RPC transaction cannot be completed within a
certain time. This can be because the transport connection was lost, certain time. This can be because the transport connection was lost,
the Call or Reply message was dropped, or because the Upper Layer the Call or Reply message was dropped, or because the Upper Layer
skipping to change at page 10, line 31 skipping to change at page 10, line 49
transport connection. It must wait for the ONC RPC client to re- transport connection. It must wait for the ONC RPC client to re-
establish the transport connection before it can retransmit ONC RPC establish the transport connection before it can retransmit ONC RPC
transactions in the backward direction. transactions in the backward direction.
If an ONC RPC client has no work to do, it may be some time before it If an ONC RPC client has no work to do, it may be some time before it
re-establishes a transport connection. Backward direction Requesters re-establishes a transport connection. Backward direction Requesters
must be prepared to wait indefinitely for a connection to be must be prepared to wait indefinitely for a connection to be
established before a pending backward direction ONC RPC Call can be established before a pending backward direction ONC RPC Call can be
retransmitted. retransmitted.
Forward direction Requesters are responsible for maintaining a
transport connection as long as there is the possibility of backward
direction requests. For example, an NFSv4.1 client with open
delegated files or active pNFS layouts should maintain a transport
connection so the server can send callback operations.
6. In the Absence of Backward Direction Support 6. In the Absence of Backward Direction Support
An RPC-over-RDMA transport endpoint might not support backward An RPC-over-RDMA transport endpoint might not support backward
direction operation. There might be no mechanism in the transport direction operation (and thus bi-directional operation). There might
implementation to do so. Or the Upper Layer Protocol consumer might be no mechanism in the transport implementation to do so. Or in an
not yet have configured the transport to handle backward direction implementation that can support operation in the backward direction,
traffic. the Upper Layer Protocol consumer might not yet have configured or
enabled the transport to handle backward direction traffic.
If an endpoint is not prepared to receive an incoming backward If an endpoint is not prepared to receive an incoming backward
direction message, loss of the RDMA connection might result. Thus a direction message, loss of the RDMA connection might result. Thus
denial-of-service could result if a sender continues to send backward denial of service could result if a sender continues to send backward
direction messages after every transport reconnect to an endpoint direction messages after every transport reconnect to an endpoint
that is not prepared to receive them. that is not prepared to receive them.
When dealing with the possibility that the remote peer has no When dealing with the possibility that the remote peer has no
transport level support for backward direction operation, the Upper transport level support for backward direction operation, the Upper
Layer Protocol becomes responsible for informing peers when backward Layer Protocol becomes responsible for informing peers when backward
direction operation is supported. Otherwise even a simple backward direction operation is supported. Otherwise even a simple backward
direction NULL probe from a peer could result in a lost connection. direction NULL probe from a peer could result in a lost connection.
An NFSv4.1 server does not send backchannel messages to an NFSv4.1
client before the NFSv4.1 client has sent a CREATE_SESSION or a
BIND_CONN_TO_SESSION operation. As long as an NFSv4.1 client has
prepared appropriate backchannel resources before sending one of
these operations announcing support for backchannel operation,
denial-of-service is avoided.
Therefore, an Upper Layer Protocol consumer MUST NOT perform backward Therefore, an Upper Layer Protocol consumer MUST NOT perform backward
direction ONC RPC operations unless the peer consumer has indicated direction ONC RPC operations unless the peer consumer has indicated
it is prepared to handle them. A description of Upper Layer Protocol it is prepared to handle them. A description of Upper Layer Protocol
mechanisms used for this indication is outside the scope of this mechanisms used for this indication is outside the scope of this
document. document.
7. Backward Direction Upper Layer Binding For example, an NFSv4.1 server does not send backchannel messages to
an NFSv4.1 client before the NFSv4.1 client has sent a CREATE_SESSION
or a BIND_CONN_TO_SESSION operation. As long as an NFSv4.1 client
has prepared appropriate backchannel resources before sending one of
these operations announcing support for backchannel operation, denial
of service is avoided.
Since backward direction operation occurs on an already-established 7. Considerations For Upper Layer Bindings
connection, there is no need to specify RPC bind parameters.
An Upper Layer Protocol that operates on RPC-over-RDMA transports in An Upper Layer Protocol that operates on RPC-over-RDMA transports may
the backward direction may have DDP-eligible data items. These are have procedures that include DDP-eligible data items. DDP-
specified in an Upper Layer Binding document. eligibility is specified in an Upper Layer Binding. Direction of
operation does not obviate the need for DDP-eligibility statements.
By default, no data items in a ULP are DDP-eligible. If there are no Backward-only operation requires the client endpoint to establish a
DDP-eligible data items to document, an explicit Upper Layer Binding fresh connection. The Upper Layer Binding can specify appropriate
may not be needed for an Upper Layer Protocol that operates only in RPC binding parameters for such connections.
the backward direction.
Consult Section 7 of [I-D.ietf-nfsv4-rfc5666bis] for details about Bi-directional operation occurs on an already-established connection.
what else may be contained in a binding. Specification of RPC binding parameters is usually not necessary in
this case.
For bi-directional operation, other considerations about sharing an
RPC-over-RDMA transport with another ULP may apply. Consult
Section 7 of [I-D.ietf-nfsv4-rfc5666bis] for details about what else
may be contained in an Upper Layer Binding.
8. Security Considerations 8. Security Considerations
Security considerations for operation on RPC-over-RDMA transports are Security considerations for operation on RPC-over-RDMA transports are
outlined in Section 9 of [I-D.ietf-nfsv4-rfc5666bis]. outlined in Section 9 of [I-D.ietf-nfsv4-rfc5666bis].
9. IANA Considerations 9. IANA Considerations
This document does not require actions by IANA. This document does not require actions by IANA.
 End of changes. 45 change blocks. 
113 lines changed or deleted 145 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/