draft-ietf-nfsv4-rpcrdma-bidirection-06.txt   draft-ietf-nfsv4-rpcrdma-bidirection-07.txt 
Network File System Version 4 C. Lever Network File System Version 4 C. Lever
Internet-Draft Oracle Internet-Draft Oracle
Intended status: Standards Track January 20, 2017 Intended status: Standards Track February 8, 2017
Expires: July 24, 2017 Expires: August 12, 2017
Bi-directional Remote Procedure Call On RPC-over-RDMA Transports Bi-directional Remote Procedure Call On RPC-over-RDMA Transports
draft-ietf-nfsv4-rpcrdma-bidirection-06 draft-ietf-nfsv4-rpcrdma-bidirection-07
Abstract Abstract
Minor versions of NFSv4 newer than NFSv4.0 work best when ONC RPC Minor versions of Network File System (NFS) version 4 newer than
transports can send Remote Procedure Call transactions in both minor version 0 work best when Remote Procedure Call (RPC) transports
directions on the same connection. This document describes how RPC- can send RPC transactions in both directions on the same connection.
over-RDMA transport endpoints convey RPCs in both directions on a This document describes how RPC transport endpoints capable of Remote
Direct Memory Access (RDMA) convey RPCs in both directions on a
single connection. single connection.
Requirements Language Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
Status of This Memo Status of This Memo
skipping to change at page 1, line 40 skipping to change at page 1, line 41
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 24, 2017. This Internet-Draft will expire on August 12, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Understanding RPC Direction . . . . . . . . . . . . . . . . . 2 2. Understanding RPC Direction . . . . . . . . . . . . . . . . . 3
3. Immediate Uses Of Bi-Directional RPC-over-RDMA . . . . . . . 4 3. Immediate Uses Of Bi-Directional RPC-over-RDMA . . . . . . . 5
4. Flow Control . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Flow Control . . . . . . . . . . . . . . . . . . . . . . . . 6
5. Sending And Receiving Backward Operations . . . . . . . . . . 8 5. Sending And Receiving Operations In The Reverse Direction . . 8
6. In the Absence of Backward Direction Support . . . . . . . . 10 6. In the Absence of Support For Reverse Direction Operation . . 11
7. Considerations For Upper Layer Bindings . . . . . . . . . . . 11 7. Considerations For Upper Layer Bindings . . . . . . . . . . . 12
8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
10. Normative References . . . . . . . . . . . . . . . . . . . . 12 10. Normative References . . . . . . . . . . . . . . . . . . . . 12
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 12 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 13
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
The purpose of this document is to enable concurrent operation in RPC-over-RDMA transports, introduced in [I-D.ietf-nfsv4-rfc5666bis],
both directions on a single transport connection using RPC-over-RDMA efficiently convey Remote Procedure Call transactions (RPCs) on
protocol versions that do not have specific facilities for backward transport layers capable of Remote Direct Memory Access (RDMA). The
purpose of this document is to enable concurrent operation in both
directions on a single transport connection using RPC-over-RDMA
protocol versions that do not have specific facilities for reverse
direction operation. direction operation.
Backward direction RPC transactions are necessary for the operation Reverse direction RPC transactions are necessary for the operation of
of NFSv4.1, and in particular, of Parallel NFS (pNFS) [RFC5661], version 4.1 of the Network File System (NFS), and in particular, of
though any Upper Layer Protocol implementation may make use of them. Parallel NFS (pNFS) [RFC5661], though any Upper Layer Protocol
An Upper Layer Binding for NFSv4.x callback operation is additionally implementation may make use of them. An Upper Layer Binding for NFS
required (see Section 7), but is not provided in this document. version 4.x callback operation is additionally required (see
Section 7), but is not provided in this document.
For example, using the approach described herein, RPC transactions For example, using the approach described herein, RPC transactions
can be conveyed in both directions on the same RPC-over-RDMA Version can be conveyed in both directions on the same RPC-over-RDMA Version
One connection without changes to the XDR description of RPC-over- One connection without changes to the RPC-over-RDMA Version One
RDMA Version One. This document does not modify the XDR or protocol protocol. This document does not update the protocol specified in
described in [I-D.ietf-nfsv4-rfc5666bis]. Future versions of RPC- [I-D.ietf-nfsv4-rfc5666bis].
over-RDMA may adopt the approach described herein, or may replace it
with a different approach. The remainder of this document assumes familiarity with the
terminology and concepts contained in [I-D.ietf-nfsv4-rfc5666bis],
especially Sections 2 and 3.
2. Understanding RPC Direction 2. Understanding RPC Direction
The ONC RPC protocol as described in [RFC5531] is architected as a The Remote Procedure Call (ONC RPC) protocol as described in
message-passing protocol between one server and one or more clients. [RFC5531] is architected as a message-passing protocol between one
ONC RPC transactions are made up of two types of messages. server and one or more clients. ONC RPC transactions are made up of
two types of messages.
A CALL message, or "Call", requests work. A Call is designated by A CALL message, or "Call", requests work. A Call is designated by
the value CALL in the message's msg_type field. An arbitrary unique the value CALL in the message's msg_type field. An arbitrary unique
value is placed in the message's xid field. A host that originates a value is placed in the message's XID field. A host that originates a
Call is referred to in this document as a "Requester." Call is referred to in this document as a "Requester."
A REPLY message, or "Reply", reports the results of work requested by A REPLY message, or "Reply", reports the results of work requested by
a Call. A Reply is designated by the value REPLY in the message's a Call. A Reply is designated by the value REPLY in the message's
msg_type field. The value contained in the message's xid field is msg_type field. The value contained in the message's XID field is
copied from the Call whose results are being returned. A host that copied from the Call whose results are being returned. A host that
emits a Reply is referred to as a "Responder." emits a Reply is referred to as a "Responder."
Typically, a Call results in a corresponding Reply. A Reply is never Typically, a Call results in a corresponding Reply. A Reply is never
sent without a corresponding Call. sent without a corresponding Call.
RPC-over-RDMA is a connection-oriented RPC transport. In all cases, RPC-over-RDMA is a connection-oriented RPC transport. In all cases,
when a connection-oriented transport is used, ONC RPC client when a connection-oriented transport is used, ONC RPC client
endpoints are responsible for initiating transport connections, while endpoints are responsible for initiating transport connections, while
ONC RPC service endpoints passively await incoming connection ONC RPC service endpoints passively await incoming connection
skipping to change at page 3, line 34 skipping to change at page 3, line 45
RPC direction on connectionless RPC transports is not addressed in RPC direction on connectionless RPC transports is not addressed in
this document. this document.
2.1. Forward Direction 2.1. Forward Direction
Traditionally, an ONC RPC client acts as a Requester, while an ONC Traditionally, an ONC RPC client acts as a Requester, while an ONC
RPC service acts as a Responder. This form of message passing is RPC service acts as a Responder. This form of message passing is
referred to as "forward direction" operation. referred to as "forward direction" operation.
2.2. Backward Direction 2.2. Reverse Direction
The ONC RPC specification [RFC5531] does not forbid passing messages The ONC RPC specification [RFC5531] does not forbid passing messages
in the other direction. An ONC RPC service endpoint can act as a in the other direction. An ONC RPC service endpoint can act as a
Requester, in which case an ONC RPC client endpoint acts as a Requester, in which case an ONC RPC client endpoint acts as a
Responder. This form of message passing is referred to as "backward Responder. This form of message passing is referred to as "reverse
direction" operation. direction" operation.
During backward direction operation, the ONC RPC client is During reverse direction operation, the ONC RPC client is responsible
responsible for establishing transport connections, even though ONC for establishing transport connections, even though ONC RPC Calls
RPC Calls come from the ONC RPC server. come from the ONC RPC server.
ONC RPC clients and services are optimized to perform and scale well ONC RPC clients and servers are optimized to perform and scale well
while handling traffic in the forward direction, and might not be while handling traffic in the forward direction, and might not be
prepared to handle operation in the backward direction. Not until prepared to handle operation in the reverse direction. Not until NFS
NFSv4.1 [RFC5661] has there been a strong need to handle backward version 4.1 [RFC5661] has there been a strong need to handle reverse
direction operation. direction operation.
2.3. Bi-directional Operation 2.3. Bi-directional Operation
A pair of connected RPC endpoints may choose to use only forward or A pair of connected RPC endpoints may choose to use only forward or
only backward direction operations on a particular transport. Or, only reverse direction operations on a particular transport. Or,
these endpoints may send Calls in both directions concurrently on the these endpoints may send Calls in both directions concurrently on the
same transport. same transport.
"Bi-directional operation" occurs when both transport endpoints act "Bi-directional operation" occurs when both transport endpoints act
as a Requester and a Responder at the same time. as a Requester and a Responder at the same time.
Bi-directionality is an extension of RPC transport connection Bi-directionality is an extension of RPC transport connection
sharing. Two RPC endpoints wish to exchange independent RPC messages sharing. Two RPC endpoints wish to exchange independent RPC messages
over a shared connection, but in opposite directions. These messages over a shared connection, but in opposite directions. These messages
may or may not be related to the same workloads or RPC Programs. may or may not be related to the same workloads or RPC Programs.
2.4. XID Values 2.4. XID Values
Section 9 of [RFC5531] introduces the ONC RPC transaction identifier, Section 9 of [RFC5531] introduces the ONC RPC transaction identifier,
or "xid" for short. The value of an xid is interpreted in the or "XID" for short. The value of an XID is interpreted in the
context of the message's msg_type field. context of the message's msg_type field.
o The xid of a Call is arbitrary but is unique among outstanding o The XID of a Call is arbitrary but is unique among outstanding
Calls from that Requester. Calls from that Requester.
o The xid of a Reply always matches that of the initiating Call. o The XID of a Reply always matches that of the initiating Call.
When receiving a Reply, a Requester matches the xid value in the When receiving a Reply, a Requester matches the XID value in the
Reply with a Call it previously sent. Reply with a Call it previously sent.
2.4.1. XID Generation 2.4.1. XID Generation
During bi-directional operation, forward and backward direction XIDs During bi-directional operation, forward and reverse direction XIDs
are typically generated on distinct hosts by possibly different are typically generated on distinct hosts by possibly different
algorithms. There is no co-ordination between forward and backward algorithms. There is no co-ordination between forward and reverse
direction XID generation. direction XID generation.
Therefore, a forward direction Requester MAY use the same xid value Therefore, a forward direction Requester MAY use the same XID value
at the same time as a backward direction Requester on the same at the same time as a reverse direction Requester on the same
transport connection. Though such concurrent requests use the same transport connection. Though such concurrent requests use the same
xid value, they represent distinct ONC RPC transactions. XID value, they represent distinct ONC RPC transactions.
3. Immediate Uses Of Bi-Directional RPC-over-RDMA 3. Immediate Uses Of Bi-Directional RPC-over-RDMA
3.1. NFSv4.0 Callback Operation 3.1. NFS version 4.0 Callback Operation
An NFSv4.0 client employs a traditional ONC RPC client to send NFS An NFS version 4.0 client employs a traditional ONC RPC client to
requests to an NFSv4.0 server's traditional ONC RPC service send NFS requests to an NFS version 4.0 server's traditional ONC RPC
[RFC7530]. NFSv4.0 requests flow in the forward direction on a service [RFC7530]. NFS version 4.0 requests flow in the forward
connection established by the client. This connection is referred to direction on a connection established by the client. This connection
as a "forechannel" connection. is referred to as a "forechannel" connection.
An NFSv4 "delegation" is simply a promise made by a server that it An NFS version 4.x "delegation" is simply a promise made by a server
will notify a client before another client or program running on the that it will notify a client before another client or program running
server is allowed access to a file. With this guarantee, that client on the server is allowed access to a file. With this guarantee, that
can operate as sole accessor of the file. In particular, it can client can operate as sole accessor of the file. In particular, it
manage the file's data and metadata caches aggressively. can manage the file's data and metadata caches aggressively.
To administer file delegations, NFSv4.0 introduces the use of To administer file delegations, NFS version 4.0 introduces the use of
callback operations, or "callbacks", in Section 10.2 of [RFC7530]. callback operations, or "callbacks", in Section 10.2 of [RFC7530].
An NFSv4.0 server sets up a traditional ONC RPC client, and an An NFS version 4.0 server sets up a forward direction ONC RPC client,
NFSv4.0 client sets up a traditional ONC RPC service. Callbacks flow and an NFS version 4.0 client sets up a forward direction ONC RPC
in the forward direction on a connection established between the service. Callbacks flow in the forward direction on a connection
server's callback client, and the client's callback server. This established between the server's callback client, and the client's
connection is distinct from connections being used as forechannels, callback service. This connection is distinct from connections being
and is referred to as a "backchannel connection." used as forechannels, and is referred to as a "backchannel
connection."
When an RDMA transport is used as a forechannel, an NFSv4.0 client When an RDMA transport is used as a forechannel, an NFS version 4.0
typically provides a TCP callback service. The client's SETCLIENTID client typically provides a TCP-based callback service. The client's
operation advertises the callback service endpoint with a "tcp" or SETCLIENTID operation advertises the callback service endpoint with a
"tcp6" netid. The server then connects to this service using a TCP "tcp" or "tcp6" netid. The server then connects to this service
socket. using a TCP socket.
NFSv4.0 implementations can function without a backchannel in place. NFS version 4.0 implementations can function without a backchannel in
In this case, the server does not grant file delegations. This might place. In this case, the NFS server does not grant file delegations.
result in a negative performance effect, but correctness is not This might result in a negative performance effect, but correctness
affected. is not affected.
3.2. NFSv4.1 Callback Operation 3.2. NFS version 4.1 Callback Operation
NFSv4.1 supports file delegation in a similar fashion to NFSv4.0, and NFS version 4.1 supports file delegation in a similar fashion to NFS
extends the callback mechanism to manage pNFS layouts, as discussed version 4.0, and extends the callback mechanism to manage pNFS
in Section 12 of [RFC5661]. layouts, as discussed in Section 12 of [RFC5661].
NFSv4.1 transport connections are initiated by NFSv4.1 clients. NFS version 4.1 transport connections are initiated by NFS version
Therefore NFSv4.1 servers send callbacks to clients in the backward 4.1 clients. Therefore NFS version 4.1 servers send callbacks to
direction on connections established by NFSv4.1 clients. clients in the reverse direction on connections established by NFS
version 4.1 clients.
NFSv4.1 clients and servers indicate to their peers that a NFS version 4.1 clients and servers indicate to their peers that a
backchannel capability is available on a given transport in the backchannel capability is available on a given transport in the
arguments and results of NFS CREATE_SESSION or BIND_CONN_TO_SESSION arguments and results of the NFS CREATE_SESSION or
operations. BIND_CONN_TO_SESSION operations.
NFSv4.1 clients may establish distinct transport connections for NFS version 4.1 clients may establish distinct transport connections
forechannel and backchannel operation, or they may combine for forechannel and backchannel operation, or they may combine
forechannel and backchannel operation on one transport connection forechannel and backchannel operation on one transport connection
using bi-directional operation. using bi-directional operation.
Without a backward direction RPC-over-RDMA capability, an NFSv4.1 Without a reverse direction RPC-over-RDMA capability, an NFS version
client must additionally connect using a transport with backward 4.1 client must additionally connect using a transport with reverse
direction capability to use as a backchannel. TCP is the only choice direction capability to use as a backchannel. Opening an independent
for an NFSv4.1 backchannel connection in this case. TCP socket is the only choice for an NFS version 4.1 backchannel
connection in this case.
Implementations often find it more convenient to use a single Implementations often find it more convenient to use a single
combined transport (ie. a transport that is capable of bi-directional combined transport (i.e. a transport that is capable of bi-
operation). This simplifies connection establishment and recovery directional operation). This simplifies connection establishment and
during network partitions or when one endpoint restarts. This can recovery during network partitions or when one endpoint restarts.
also enable better scaling by using fewer transport connections to This can also enable better scaling by using fewer transport
perform the same work. connections to perform the same work.
As with NFSv4.0, if a backchannel is not in use, an NFSv4.1 server As with NFS version 4.0, if a backchannel is not in use, an NFS
does not grant delegations. Because NFSv4.1 relies on callbacks to version 4.1 server does not grant delegations. Because NFS version
manage pNFS layout state, pNFS operation is not possible without a 4.1 relies on callbacks to manage pNFS layout state, pNFS operation
backchannel. is not possible without a backchannel.
4. Flow Control 4. Flow Control
For an RDMA Send operation to work properly, the receiving peer must For an RDMA Send operation to work properly, the receiving peer must
have posted a receive buffer in which to accept the incoming message. have posted a receive buffer in which to accept the incoming message.
If a receiver hasn't posted enough buffers to accommodate each If a receiver hasn't posted enough buffers to accommodate each
incoming Send operation, the receiving RDMA provider is allowed to incoming Send operation, the receiving RDMA provider is allowed to
terminate the RDMA connection. terminate the RDMA connection.
RPC-over-RDMA transport protocols provide built-in send flow control RPC-over-RDMA transport protocols provide built-in send flow control
to prevent overrunning the number of pre-posted receive buffers on a to prevent overrunning the number of pre-posted receive buffers on a
connection's receive endpoint. For RPC-over-RDMA Version One, this connection's receive endpoint. For RPC-over-RDMA Version One, this
is discussed in Section 4.3 of [I-D.ietf-nfsv4-rfc5666bis]. is discussed in Section 4.3 of [I-D.ietf-nfsv4-rfc5666bis].
4.1. Backward Credits 4.1. Reverse-direction Credits
RPC-over-RDMA credits work the same way in the backward direction as RPC-over-RDMA credits work the same way in the reverse direction as
they do in the forward direction. However, forward direction credits they do in the forward direction. However, forward direction credits
and backward direction credits on the same connection are accounted and reverse direction credits on the same connection are accounted
separately. separately.
The forward direction credit value retains the same meaning whether The forward direction credit value retains the same meaning whether
or not there are backward direction resources associated with an RPC- or not there are reverse direction resources associated with an RPC-
over-RDMA transport connection. This is the number of RPC requests over-RDMA transport connection. This is the number of RPC requests
the forward direction responder (the RPC server) is prepared to the forward direction responder (the ONC RPC server) is prepared to
receive concurrently. receive concurrently.
The backward direction credit value is the number of RPC requests the The reverse direction credit value is the number of RPC requests the
backward direction responder (the RPC client) is prepared to receive reverse direction responder (the ONC RPC client) is prepared to
concurrently. The backward direction credit value MAY be different receive concurrently. The reverse direction credit value MAY be
than the forward direction credit value. different than the forward direction credit value.
During bi-directional operation, each receiver has to decide whether During bi-directional operation, each receiver has to decide whether
an incoming message contains a credit request (the receiver is acting an incoming message contains a credit request (the receiver is acting
as a responder) or a credit grant (the receiver is acting as a as a responder) or a credit grant (the receiver is acting as a
requester) and apply the credit value accordingly. requester) and apply the credit value accordingly.
When message direction is not fully determined by context (e.g., When message direction is not fully determined by context (e.g.,
suggested by the definition of the RPC-over-RDMA version that is in suggested by the definition of the RPC-over-RDMA version that is in
use) or by an accompanying RPC message payload with a call direction use) or by an accompanying RPC message payload with a call direction
field, it is not possible for the receiver to tell with certainty field, it is not possible for the receiver to tell with certainty
whether the header credit value is a request or grant. In such whether the header credit value is a request or grant. In such
cases, the receiver MUST ignore the header's credit value. cases, the receiver MUST ignore the header's credit value.
4.2. Inline Thresholds 4.2. Inline Thresholds
Forward and backward operation on the same connection share the same Forward and reverse direction operation on the same connection share
receive buffers. Therefore the inline threshold values for the the same receive buffers. Therefore the inline threshold values for
forward direction and the backward direction are the same. The call the forward direction and the reverse direction are the same. The
inline threshold for the backward direction is the same as the reply call inline threshold for the reverse direction is the same as the
inline threshold for the forward direction, and vice versa. For more reply inline threshold for the forward direction, and vice versa.
information, see Section 4.3.2 of [I-D.ietf-nfsv4-rfc5666bis]. For more information, see Section 4.3.2 of
[I-D.ietf-nfsv4-rfc5666bis].
4.3. Managing Receive Buffers 4.3. Managing Receive Buffers
An RPC-over-RDMA transport endpoint must pre-post receive buffers An RPC-over-RDMA transport endpoint must post receive buffers before
before it can receive and process incoming RPC-over-RDMA messages. it can receive and process incoming RPC-over-RDMA messages. If a
If a sender transmits a message for a receiver which has no posted sender transmits a message for a receiver which has no posted receive
receive buffer, the RDMA provider is allowed to drop the RDMA buffer, the RDMA provider is allowed to drop the RDMA connection.
connection.
4.3.1. Client Receive Buffers 4.3.1. Client Receive Buffers
Typically an RPC-over-RDMA Requester posts only as many receive Typically an RPC-over-RDMA Requester posts only as many receive
buffers as there are outstanding RPC Calls. A client endpoint buffers as there are outstanding RPC Calls. A client endpoint
without backward direction support might therefore at times have no without reverse direction support might therefore at times have no
pre-posted receive buffers. available receive buffers.
To receive incoming backward direction Calls, an RPC-over-RDMA client To receive incoming reverse direction Calls, an RPC-over-RDMA client
endpoint must pre-post enough additional receive buffers to match its endpoint must post enough additional receive buffers to match its
advertised backward direction credit value. Each outstanding forward advertised reverse direction credit value. Each outstanding forward
direction RPC requires an additional receive buffer above this direction RPC requires an additional receive buffer above this
minimum. minimum.
When an RDMA transport connection is lost, all active receive buffers When an RDMA transport connection is lost, all active receive buffers
are flushed and are no longer available to receive incoming messages. are flushed and are no longer available to receive incoming messages.
When a fresh transport connection is established, a client endpoint When a fresh transport connection is established, a client endpoint
must re-post a receive buffer to handle the Reply for each must re-post a receive buffer to handle the Reply for each
retransmitted forward direction Call, and a full set of receive retransmitted forward direction Call, and a full set of receive
buffers to handle backward direction Calls. buffers to handle reverse direction Calls.
4.3.2. Server Receive Buffers 4.3.2. Server Receive Buffers
A forward direction RPC-over-RDMA service endpoint posts as many A forward direction RPC-over-RDMA service endpoint posts as many
receive buffers as it expects incoming forward direction Calls. That receive buffers as it expects incoming forward direction Calls. That
is, it posts no fewer buffers than the number of credits granted in is, it posts no fewer buffers than the number of credits granted in
the rdma_credit field of forward direction RPC replies. the rdma_credit field of forward direction RPC replies.
To receive incoming backward direction replies, an RPC-over-RDMA To receive incoming reverse direction replies, an RPC-over-RDMA
server endpoint must pre-post enough additional receive buffers to server endpoint must post enough additional receive buffers to handle
handle replies for each backward direction Call it sends. replies for each reverse direction Call it sends.
When the existing transport connection is lost, all active receive When the existing transport connection is lost, all active receive
buffers are flushed and are no longer available to receive incoming buffers are flushed and are no longer available to receive incoming
messages. When a fresh transport connection is established, a server messages. When a fresh transport connection is established, a server
endpoint must re-post a receive buffer to handle the Reply for each endpoint must re-post a receive buffer to handle the Reply for each
retransmitted backward direction Call, and a full set of receive retransmitted reverse direction Call, and a full set of receive
buffers for receiving forward direction Calls. buffers for receiving forward direction Calls.
5. Sending And Receiving Backward Operations 5. Sending And Receiving Operations In The Reverse Direction
The operation of RPC-over-RDMA transports in the forward direction is The operation of RPC-over-RDMA transports in the forward direction is
defined in [RFC5531] and [I-D.ietf-nfsv4-rfc5666bis]. In this defined in [RFC5531] and [I-D.ietf-nfsv4-rfc5666bis]. In this
section, a mechanism for backward direction operation on RPC-over- section, a mechanism for reverse direction operation on RPC-over-RDMA
RDMA is defined. Backward operation used in combination with forward is defined. Reverse direction operation used in combination with
operation enables bi-directional communication on a common RPC forward operation enables bi-directional communication on a common
transport connection. RPC-over-RDMA transport connection.
Certain fields in the RPC-over-RDMA header are fixed for all versions Certain fields in the RPC-over-RDMA header have a fixed position in
of RPC-over-RDMA. The XDR description of these fields is contained all versions of RPC-over-RDMA. The normative specification of these
in Section 5.1 of [I-D.ietf-nfsv4-rfc5666bis]. fields is contained in Section 5.1 of [I-D.ietf-nfsv4-rfc5666bis].
5.1. Sending A Backward Direction Call 5.1. Sending A Call In The Reverse Direction
To form a backward direction RPC-over-RDMA Call message, an ONC RPC To form a reverse direction RPC-over-RDMA Call message, an ONC RPC
service endpoint constructs an RPC-over-RDMA header containing a service endpoint constructs an RPC-over-RDMA header containing a
fresh RPC XID in the rdma_xid field (see Section 2.4 for full fresh RPC XID in the rdma_xid field (see Section 2.4 for full
requirements). requirements).
The rdma_vers field MUST contain the same value in backward and The rdma_vers field MUST contain the same value in reverse and
forward direction Call messages on the same connection. forward direction Call messages on the same connection.
The number of requested backward direction credits is placed in the The number of requested reverse direction credits is placed in the
rdma_credit field (see Section 4). rdma_credit field (see Section 4).
Whether presented inline or as a separate chunk, the ONC RPC Call Whether presented inline or as a separate chunk, the ONC RPC Call
header MUST start with the same XID value that is present in the RPC- header MUST start with the same XID value that is present in the RPC-
over-RDMA header, and the RPC header's msg_type field MUST contain over-RDMA header, and the RPC header's msg_type field MUST contain
the value CALL. the value CALL.
5.2. Sending A Backward Direction Reply 5.2. Sending A Reply In The Reverse Direction
To form a backward direction RPC-over-RDMA Reply message, an ONC RPC To form a reverse direction RPC-over-RDMA Reply message, an ONC RPC
client endpoint constructs an RPC-over-RDMA header containing a copy client endpoint constructs an RPC-over-RDMA header containing a copy
of the matching ONC RPC Call's RPC XID in the rdma_xid field (see of the matching ONC RPC Call's RPC XID in the rdma_xid field (see
Section 2.4 for full requirements). Section 2.4 for full requirements).
The rdma_vers field MUST contain the same value in a backward The rdma_vers field MUST contain the same value in a reverse
direction Reply message as in the matching Call message. direction Reply message as in the matching Call message.
The number of granted backward direction credits is placed in the The number of granted reverse direction credits is placed in the
rdma_credit field (see Section 4). rdma_credit field (see Section 4).
Whether presented inline or as a separate chunk, the ONC RPC Reply Whether presented inline or as a separate chunk, the ONC RPC Reply
header MUST start with the same XID value that is present in the RPC- header MUST start with the same XID value that is present in the RPC-
over-RDMA header, and the RPC header's msg_type field MUST contain over-RDMA header, and the RPC header's msg_type field MUST contain
the value REPLY. the value REPLY.
5.3. Backward Direction Chunks 5.3. Using Chunks In Reverse Direction Operations
Chunks MAY be used in the backward direction. They operate the same A "chunk" refers to a portion of a message's Payload stream that is
way as in the forward direction (see [I-D.ietf-nfsv4-rfc5666bis] for moved by a separate mechanism. Chunk data may be moved by an
details). explicit RDMA operation, for example. Chunks are defined in
Section 3.4.4 of [I-D.ietf-nfsv4-rfc5666bis].
An implementation might not support any Upper Layer Protocol that has Chunks MAY be used in the reverse direction. They operate the same
DDP-eligible data items. The Upper Layer Protocol may also use only way as in the forward direction.
small messages, or it may have a native mechanism for restricting the
size of backward direction RPC messages, obviating the need to handle
Long Messages in the backward direction.
When there is no Upper Layer Protocol requirement for chunks, A backchannel implementation might not support any Upper Layer
implementers can choose not to provide support for chunks in the Protocol that has DDP-eligible data items. Such Upper Layer
backward direction. This avoids the complexity of adding support for Protocols may use only small messages, or they may have a native
performing RDMA Reads and Writes in the backward direction. mechanism for restricting the size of reverse direction RPC messages,
obviating the need to handle Long Messages in the reverse direction.
When chunks are not implemented, RPC messages in the backward When there is no Upper Layer Protocol requirement for chunks in the
direction are always sent using RDMA_MSG, and therefore can be no reverse direction, implementers can choose not to provide support for
larger than what can be sent inline (that is, without chunks). chunks in the reverse direction. This avoids the complexity of
adding support for performing RDMA Reads and Writes in the reverse
direction.
When chunks are not implemented, RPC messages in the reverse
direction are always sent using a Short message, and therefore can be
no larger than what can be sent inline (that is, without chunks).
Sending an inline message larger than the inline threshold can result Sending an inline message larger than the inline threshold can result
in loss of connection. in loss of connection.
If a backward direction requester provides a non-empty chunk list to If a reverse direction requester provides a non-empty chunk list to a
a responder that does not support chunks, the responder MUST reply responder that does not support chunks, the responder MUST reply with
with an RDMA_ERROR message with rdma_err field set to ERR_CHUNK. an RDMA_ERROR message with rdma_err field set to ERR_CHUNK.
5.4. Backward Direction Retransmission 5.4. Reverse Direction Retransmission
In rare cases, an ONC RPC transaction cannot be completed within a In rare cases, an ONC RPC service cannot complete an RPC transaction
certain time. This can be because the transport connection was lost, and then send a reply. This can be because the transport connection
the Call or Reply message was dropped, or because the Upper Layer was lost, the Call or Reply message was dropped, or because the Upper
consumer delayed or dropped the ONC RPC request. Typically, the Layer consumer delayed or dropped the ONC RPC request. Typically,
Requester sends the transaction again, reusing the same RPC XID. the Requester sends the transaction again, reusing the same RPC XID.
This is known as an "RPC retransmission". This is known as an "RPC retransmission".
In the forward direction, the Requester is the ONC RPC client. The In the forward direction, the Requester is the ONC RPC client. The
client is always responsible for establishing a transport connection client is always responsible for establishing a transport connection
before sending again. before sending again.
In the backward direction, the Requester is the ONC RPC server. In the reverse direction, the Requester is the ONC RPC server.
Because an ONC RPC server does not establish transport connections Because an ONC RPC server does not establish transport connections
with clients, it cannot send a retransmission if there is no with clients, it cannot send a retransmission if there is no
transport connection. It must wait for the ONC RPC client to re- transport connection. It must wait for the ONC RPC client to re-
establish the transport connection before it can retransmit ONC RPC establish the transport connection before it can retransmit ONC RPC
transactions in the backward direction. transactions in the reverse direction.
If an ONC RPC client has no work to do, it may be some time before it If an ONC RPC client has no work to do, it may be some time before it
re-establishes a transport connection. Backward direction Requesters re-establishes a transport connection. Reverse direction Requesters
must be prepared to wait indefinitely for a connection to be must be prepared to wait indefinitely for a connection to be
established before a pending backward direction ONC RPC Call can be established before a pending reverse direction ONC RPC Call can be
retransmitted. retransmitted.
Forward direction Requesters are responsible for maintaining a Forward direction Requesters are responsible for maintaining a
transport connection as long as there is the possibility of backward transport connection as long as there is the possibility of reverse
direction requests. For example, an NFSv4.1 client with open direction requests. For example, an NFS version 4.1 client with open
delegated files or active pNFS layouts should maintain a transport delegated files or active pNFS layouts should maintain a transport
connection so the server can send callback operations. connection to enable the NFS server to perform callback operations.
6. In the Absence of Backward Direction Support 6. In the Absence of Support For Reverse Direction Operation
An RPC-over-RDMA transport endpoint might not support backward An RPC-over-RDMA transport endpoint might not support reverse
direction operation (and thus it does not support bi-directional direction operation (and thus it does not support bi-directional
operation). There might be no mechanism in the transport operation). There might be no mechanism in the transport
implementation to do so. Or in an implementation that can support implementation to do so. Or in an implementation that can support
operation in the backward direction, the Upper Layer Protocol operation in the reverse direction, the Upper Layer Protocol consumer
consumer might not yet have configured or enabled the transport to might not yet have configured or enabled the transport to handle
handle backward direction traffic. reverse direction traffic.
If an endpoint is not prepared to receive an incoming backward If an endpoint is not prepared to receive an incoming reverse
direction message, loss of the RDMA connection might result. Thus direction message, loss of the RDMA connection might result. Thus
denial of service could result if a sender continues to send backward denial of service could result if a sender continues to send reverse
direction messages after every transport reconnect to an endpoint direction messages after every transport reconnect to an endpoint
that is not prepared to receive them. that is not prepared to receive them.
When dealing with the possibility that the remote peer has no When dealing with the possibility that the remote peer has no
transport level support for backward direction operation, the Upper transport level support for reverse direction operation, the Upper
Layer Protocol becomes responsible for informing peers when backward Layer Protocol becomes responsible for informing peers when reverse
direction operation is supported. Otherwise even a simple backward direction operation is supported. Otherwise even a simple reverse
direction NULL probe from a peer could result in a lost connection. direction RPC NULL procedure from a peer could result in a lost
connection.
Therefore, an Upper Layer Protocol consumer MUST NOT perform backward Therefore, an Upper Layer Protocol consumer MUST NOT perform reverse
direction ONC RPC operations unless the peer consumer has indicated direction ONC RPC operations until the peer consumer has indicated it
it is prepared to handle them. A description of Upper Layer Protocol is prepared to handle them. A description of Upper Layer Protocol
mechanisms used for this indication is outside the scope of this mechanisms used for this indication is outside the scope of this
document. document.
For example, an NFSv4.1 server does not send backchannel messages to For example, an NFS version 4.1 server does not send backchannel
an NFSv4.1 client before the NFSv4.1 client has sent a CREATE_SESSION messages to an NFS version 4.1 client before the NFS version 4.1
or a BIND_CONN_TO_SESSION operation. As long as an NFSv4.1 client client has sent a CREATE_SESSION or a BIND_CONN_TO_SESSION operation.
has prepared appropriate backchannel resources before sending one of As long as an NFS version 4.1 client has prepared appropriate
these operations announcing support for backchannel operation, denial resources to receive reverse direction operations before sending one
of service is avoided. of these NFS operations, denial of service is avoided.
7. Considerations For Upper Layer Bindings 7. Considerations For Upper Layer Bindings
An Upper Layer Protocol that operates on RPC-over-RDMA transports may An Upper Layer Protocol that operates on RPC-over-RDMA transports may
have procedures that include DDP-eligible data items. DDP- have procedures that include DDP-eligible data items. DDP-
eligibility is specified in an Upper Layer Binding. Direction of eligibility is specified in an Upper Layer Binding. Direction of
operation does not obviate the need for DDP-eligibility statements. operation does not obviate the need for DDP-eligibility statements.
Backward-only operation requires the client endpoint to establish a Reverse-direction-only operation requires the client endpoint to
fresh connection. The Upper Layer Binding can specify appropriate establish a fresh connection. The Upper Layer Binding can specify
RPC binding parameters for such connections. appropriate RPC binding parameters for such connections.
Bi-directional operation occurs on an already-established connection. Bi-directional operation occurs on an already-established connection.
Specification of RPC binding parameters is usually not necessary in Specification of RPC binding parameters is usually not necessary in
this case. this case.
For bi-directional operation, other considerations about sharing an For bi-directional operation, other considerations may apply when
RPC-over-RDMA transport with another ULP may apply. Consult distinct RPC Programs share an RPC-over-RDMA transport connection
Section 6 of [I-D.ietf-nfsv4-rfc5666bis] for details about what else concurrently. Consult Section 6 of [I-D.ietf-nfsv4-rfc5666bis] for
may be contained in an Upper Layer Binding. details about what else may be contained in an Upper Layer Binding.
8. Security Considerations 8. Security Considerations
Security considerations for operation on RPC-over-RDMA transports are Security considerations for operation on RPC-over-RDMA transports are
outlined in Section 9 of [I-D.ietf-nfsv4-rfc5666bis]. outlined in Section 9 of [I-D.ietf-nfsv4-rfc5666bis].
9. IANA Considerations 9. IANA Considerations
This document does not require actions by IANA. This document does not require actions by IANA.
10. Normative References 10. Normative References
[I-D.ietf-nfsv4-rfc5666bis] [I-D.ietf-nfsv4-rfc5666bis]
Lever, C., Simpson, W., and T. Talpey, "Remote Direct Lever, C., Simpson, W., and T. Talpey, "Remote Direct
Memory Access Transport for Remote Procedure Call, Version Memory Access Transport for Remote Procedure Call, Version
One", draft-ietf-nfsv4-rfc5666bis-09 (work in progress), One", draft-ietf-nfsv4-rfc5666bis-10 (work in progress),
January 2017. February 2017.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol
Specification Version 2", RFC 5531, May 2009. Specification Version 2", RFC 5531, May 2009.
[RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File
System (NFS) Version 4 Minor Version 1 Protocol", System (NFS) Version 4 Minor Version 1 Protocol",
RFC 5661, January 2010. RFC 5661, January 2010.
skipping to change at page 12, line 50 skipping to change at page 13, line 28
Dai Ngo was a solid partner and collaborator. Together we Dai Ngo was a solid partner and collaborator. Together we
constructed and tested independent prototypes of the changes constructed and tested independent prototypes of the changes
described in this document. described in this document.
The author wishes to thank Bill Baker for his unwavering support of The author wishes to thank Bill Baker for his unwavering support of
this work. In addition, the author gratefully acknowledges the this work. In addition, the author gratefully acknowledges the
expert contributions of Karen Deitke, Chunli Zhang, Mahesh expert contributions of Karen Deitke, Chunli Zhang, Mahesh
Siddheshwar, Steve Wise, and Tom Tucker. Siddheshwar, Steve Wise, and Tom Tucker.
Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 Special thanks go to Transport Area Director Spencer Dawkins, nfsv4
Working Group and document shepherd Chair Spencer Shepler, and nfsv4 Working Group Chair and document shepherd Spencer Shepler, and nfsv4
Working Group Secretary Tom Haynes for their support. Working Group Secretary Tom Haynes for their support.
Author's Address Author's Address
Charles Lever Charles Lever
Oracle Corporation Oracle Corporation
1015 Granger Avenue 1015 Granger Avenue
Ann Arbor, MI 48104 Ann Arbor, MI 48104
USA USA
 End of changes. 93 change blocks. 
211 lines changed or deleted 228 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/