draft-ietf-nfsv4-rpcrdma-bidirection-07.txt   draft-ietf-nfsv4-rpcrdma-bidirection-08.txt 
Network File System Version 4 C. Lever Network File System Version 4 C. Lever
Internet-Draft Oracle Internet-Draft Oracle
Intended status: Standards Track February 8, 2017 Intended status: Standards Track March 8, 2017
Expires: August 12, 2017 Expires: September 9, 2017
Bi-directional Remote Procedure Call On RPC-over-RDMA Transports Bi-directional Remote Procedure Call On RPC-over-RDMA Transports
draft-ietf-nfsv4-rpcrdma-bidirection-07 draft-ietf-nfsv4-rpcrdma-bidirection-08
Abstract Abstract
Minor versions of Network File System (NFS) version 4 newer than Minor versions of Network File System (NFS) version 4 newer than
minor version 0 work best when Remote Procedure Call (RPC) transports minor version 0 work best when Remote Procedure Call (RPC) transports
can send RPC transactions in both directions on the same connection. can send RPC transactions in both directions on the same connection.
This document describes how RPC transport endpoints capable of Remote This document describes how RPC transport endpoints capable of Remote
Direct Memory Access (RDMA) convey RPCs in both directions on a Direct Memory Access (RDMA) convey RPCs in both directions on a
single connection. single connection.
skipping to change at page 1, line 41 skipping to change at page 1, line 41
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 12, 2017. This Internet-Draft will expire on September 9, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 6, line 18 skipping to change at page 6, line 18
backchannel capability is available on a given transport in the backchannel capability is available on a given transport in the
arguments and results of the NFS CREATE_SESSION or arguments and results of the NFS CREATE_SESSION or
BIND_CONN_TO_SESSION operations. BIND_CONN_TO_SESSION operations.
NFS version 4.1 clients may establish distinct transport connections NFS version 4.1 clients may establish distinct transport connections
for forechannel and backchannel operation, or they may combine for forechannel and backchannel operation, or they may combine
forechannel and backchannel operation on one transport connection forechannel and backchannel operation on one transport connection
using bi-directional operation. using bi-directional operation.
Without a reverse direction RPC-over-RDMA capability, an NFS version Without a reverse direction RPC-over-RDMA capability, an NFS version
4.1 client must additionally connect using a transport with reverse 4.1 client additionally connects using a transport with reverse
direction capability to use as a backchannel. Opening an independent direction capability to use as a backchannel. Opening an independent
TCP socket is the only choice for an NFS version 4.1 backchannel TCP socket is the only choice for an NFS version 4.1 backchannel
connection in this case. connection in this case.
Implementations often find it more convenient to use a single Implementations often find it more convenient to use a single
combined transport (i.e. a transport that is capable of bi- combined transport (i.e. a transport that is capable of bi-
directional operation). This simplifies connection establishment and directional operation). This simplifies connection establishment and
recovery during network partitions or when one endpoint restarts. recovery during network partitions or when one endpoint restarts.
This can also enable better scaling by using fewer transport This can also enable better scaling by using fewer transport
connections to perform the same work. connections to perform the same work.
As with NFS version 4.0, if a backchannel is not in use, an NFS As with NFS version 4.0, if a backchannel is not in use, an NFS
version 4.1 server does not grant delegations. Because NFS version version 4.1 server does not grant delegations. Because NFS version
4.1 relies on callbacks to manage pNFS layout state, pNFS operation 4.1 relies on callbacks to manage pNFS layout state, pNFS operation
is not possible without a backchannel. is not possible without a backchannel.
4. Flow Control 4. Flow Control
For an RDMA Send operation to work properly, the receiving peer must For an RDMA Send operation to work properly, the receiving peer has
have posted a receive buffer in which to accept the incoming message. to have already posted a receive buffer in which to accept the
If a receiver hasn't posted enough buffers to accommodate each incoming message. If a receiver hasn't posted enough buffers to
incoming Send operation, the receiving RDMA provider is allowed to accommodate each incoming Send operation, the receiving RDMA provider
terminate the RDMA connection. is allowed to terminate the RDMA connection.
RPC-over-RDMA transport protocols provide built-in send flow control RPC-over-RDMA transport protocols provide built-in send flow control
to prevent overrunning the number of pre-posted receive buffers on a to prevent overrunning the number of pre-posted receive buffers on a
connection's receive endpoint. For RPC-over-RDMA Version One, this connection's receive endpoint using a "credit grant" mechanism. The
is discussed in Section 4.3 of [I-D.ietf-nfsv4-rfc5666bis]. use of credits in RPC-over-RDMA Version One is described in
Section 3.3 of [I-D.ietf-nfsv4-rfc5666bis].
4.1. Reverse-direction Credits 4.1. Reverse-direction Credits
RPC-over-RDMA credits work the same way in the reverse direction as RPC-over-RDMA credits work the same way in the reverse direction as
they do in the forward direction. However, forward direction credits they do in the forward direction. However, forward direction credits
and reverse direction credits on the same connection are accounted and reverse direction credits on the same connection are accounted
separately. separately. Direction-independent credit accounting prevents head-
of-line blocking in one direction from impacting operation in the
other direction.
The forward direction credit value retains the same meaning whether The forward direction credit value retains the same meaning whether
or not there are reverse direction resources associated with an RPC- or not there are reverse direction resources associated with an RPC-
over-RDMA transport connection. This is the number of RPC requests over-RDMA transport connection. This is the number of RPC requests
the forward direction responder (the ONC RPC server) is prepared to the forward direction responder (the ONC RPC server) is prepared to
receive concurrently. receive concurrently.
The reverse direction credit value is the number of RPC requests the The reverse direction credit value is the number of RPC requests the
reverse direction responder (the ONC RPC client) is prepared to reverse direction responder (the ONC RPC client) is prepared to
receive concurrently. The reverse direction credit value MAY be receive concurrently. The reverse direction credit value MAY be
skipping to change at page 7, line 42 skipping to change at page 7, line 44
whether the header credit value is a request or grant. In such whether the header credit value is a request or grant. In such
cases, the receiver MUST ignore the header's credit value. cases, the receiver MUST ignore the header's credit value.
4.2. Inline Thresholds 4.2. Inline Thresholds
Forward and reverse direction operation on the same connection share Forward and reverse direction operation on the same connection share
the same receive buffers. Therefore the inline threshold values for the same receive buffers. Therefore the inline threshold values for
the forward direction and the reverse direction are the same. The the forward direction and the reverse direction are the same. The
call inline threshold for the reverse direction is the same as the call inline threshold for the reverse direction is the same as the
reply inline threshold for the forward direction, and vice versa. reply inline threshold for the forward direction, and vice versa.
For more information, see Section 4.3.2 of For more information, see Section 3.3.2 of
[I-D.ietf-nfsv4-rfc5666bis]. [I-D.ietf-nfsv4-rfc5666bis].
4.3. Managing Receive Buffers 4.3. Managing Receive Buffers
An RPC-over-RDMA transport endpoint must post receive buffers before An RPC-over-RDMA transport endpoint posts receive buffers before it
it can receive and process incoming RPC-over-RDMA messages. If a can receive and process incoming RPC-over-RDMA messages. If a sender
sender transmits a message for a receiver which has no posted receive transmits a message for a receiver which has no posted receive
buffer, the RDMA provider is allowed to drop the RDMA connection. buffer, the RDMA provider is allowed to drop the RDMA connection.
4.3.1. Client Receive Buffers 4.3.1. Client Receive Buffers
Typically an RPC-over-RDMA Requester posts only as many receive Typically an RPC-over-RDMA Requester posts only as many receive
buffers as there are outstanding RPC Calls. A client endpoint buffers as there are outstanding RPC Calls. A client endpoint
without reverse direction support might therefore at times have no without reverse direction support might therefore at times have no
available receive buffers. available receive buffers.
To receive incoming reverse direction Calls, an RPC-over-RDMA client To receive incoming reverse direction Calls, an RPC-over-RDMA client
endpoint must post enough additional receive buffers to match its endpoint posts enough additional receive buffers to match its
advertised reverse direction credit value. Each outstanding forward advertised reverse direction credit value. Each outstanding forward
direction RPC requires an additional receive buffer above this direction RPC requires an additional receive buffer above this
minimum. minimum.
When an RDMA transport connection is lost, all active receive buffers When an RDMA transport connection is lost, all active receive buffers
are flushed and are no longer available to receive incoming messages. are flushed and are no longer available to receive incoming messages.
When a fresh transport connection is established, a client endpoint When a fresh transport connection is established, a client endpoint
must re-post a receive buffer to handle the Reply for each re-posts a receive buffer to handle the Reply for each retransmitted
retransmitted forward direction Call, and a full set of receive forward direction Call, and a full set of receive buffers to handle
buffers to handle reverse direction Calls. reverse direction Calls.
4.3.2. Server Receive Buffers 4.3.2. Server Receive Buffers
A forward direction RPC-over-RDMA service endpoint posts as many A forward direction RPC-over-RDMA service endpoint posts as many
receive buffers as it expects incoming forward direction Calls. That receive buffers as it expects incoming forward direction Calls. That
is, it posts no fewer buffers than the number of credits granted in is, it posts no fewer buffers than the number of credits granted in
the rdma_credit field of forward direction RPC replies. the rdma_credit field of forward direction RPC replies.
To receive incoming reverse direction replies, an RPC-over-RDMA To receive incoming reverse direction replies, an RPC-over-RDMA
server endpoint must post enough additional receive buffers to handle server endpoint posts enough additional receive buffers to handle
replies for each reverse direction Call it sends. replies for each reverse direction Call it sends.
When the existing transport connection is lost, all active receive When the existing transport connection is lost, all active receive
buffers are flushed and are no longer available to receive incoming buffers are flushed and are no longer available to receive incoming
messages. When a fresh transport connection is established, a server messages. When a fresh transport connection is established, a server
endpoint must re-post a receive buffer to handle the Reply for each endpoint re-posts a receive buffer to handle the Reply for each
retransmitted reverse direction Call, and a full set of receive retransmitted reverse direction Call, and a full set of receive
buffers for receiving forward direction Calls. buffers for receiving forward direction Calls.
5. Sending And Receiving Operations In The Reverse Direction 5. Sending And Receiving Operations In The Reverse Direction
The operation of RPC-over-RDMA transports in the forward direction is The operation of RPC-over-RDMA transports in the forward direction is
defined in [RFC5531] and [I-D.ietf-nfsv4-rfc5666bis]. In this defined in [RFC5531] and [I-D.ietf-nfsv4-rfc5666bis]. In this
section, a mechanism for reverse direction operation on RPC-over-RDMA section, a mechanism for reverse direction operation on RPC-over-RDMA
is defined. Reverse direction operation used in combination with is defined. Reverse direction operation used in combination with
forward operation enables bi-directional communication on a common forward operation enables bi-directional communication on a common
RPC-over-RDMA transport connection. RPC-over-RDMA transport connection.
Certain fields in the RPC-over-RDMA header have a fixed position in Certain fields in the RPC-over-RDMA header have a fixed position in
all versions of RPC-over-RDMA. The normative specification of these all versions of RPC-over-RDMA. The normative specification of these
fields is contained in Section 5.1 of [I-D.ietf-nfsv4-rfc5666bis]. fields is contained in Section 4 of [I-D.ietf-nfsv4-rfc5666bis].
5.1. Sending A Call In The Reverse Direction 5.1. Sending A Call In The Reverse Direction
To form a reverse direction RPC-over-RDMA Call message, an ONC RPC To form a reverse direction RPC-over-RDMA Call message, an ONC RPC
service endpoint constructs an RPC-over-RDMA header containing a service endpoint constructs an RPC-over-RDMA header containing a
fresh RPC XID in the rdma_xid field (see Section 2.4 for full fresh RPC XID in the rdma_xid field (see Section 2.4 for full
requirements). requirements).
The rdma_vers field MUST contain the same value in reverse and The rdma_vers field MUST contain the same value in reverse and
forward direction Call messages on the same connection. forward direction Call messages on the same connection.
skipping to change at page 9, line 47 skipping to change at page 9, line 47
The number of granted reverse direction credits is placed in the The number of granted reverse direction credits is placed in the
rdma_credit field (see Section 4). rdma_credit field (see Section 4).
Whether presented inline or as a separate chunk, the ONC RPC Reply Whether presented inline or as a separate chunk, the ONC RPC Reply
header MUST start with the same XID value that is present in the RPC- header MUST start with the same XID value that is present in the RPC-
over-RDMA header, and the RPC header's msg_type field MUST contain over-RDMA header, and the RPC header's msg_type field MUST contain
the value REPLY. the value REPLY.
5.3. Using Chunks In Reverse Direction Operations 5.3. Using Chunks In Reverse Direction Operations
A "chunk" refers to a portion of a message's Payload stream that is A "chunk" refers to a DDP-eligible portion of a message's Payload
moved by a separate mechanism. Chunk data may be moved by an stream that is placed directly in the receiver's memory by the
explicit RDMA operation, for example. Chunks are defined in transport. Chunk data may be moved by an explicit RDMA operation,
Section 3.4.4 of [I-D.ietf-nfsv4-rfc5666bis]. for example. Chunks are defined in Section 3.4.4 and DDP-eligibility
is covered in Section 6.1 of [I-D.ietf-nfsv4-rfc5666bis].
Chunks MAY be used in the reverse direction. They operate the same Chunks MAY be used in the reverse direction. They operate the same
way as in the forward direction. way as in the forward direction.
A backchannel implementation might not support any Upper Layer An implementation might support only Upper Layer Protocols that have
Protocol that has DDP-eligible data items. Such Upper Layer no DDP-eligible data items. Such Upper Layer Protocols may use only
Protocols may use only small messages, or they may have a native small messages, or they may have a native mechanism for restricting
mechanism for restricting the size of reverse direction RPC messages, the size of reverse direction RPC messages, obviating the need to
obviating the need to handle Long Messages in the reverse direction. handle Long Messages in the reverse direction.
When there is no Upper Layer Protocol requirement for chunks in the When there is no Upper Layer Protocol requirement for chunks in the
reverse direction, implementers can choose not to provide support for reverse direction, implementers can choose not to provide support for
chunks in the reverse direction. This avoids the complexity of chunks in the reverse direction. This avoids the complexity of
adding support for performing RDMA Reads and Writes in the reverse adding support for performing RDMA Reads and Writes in the reverse
direction. direction.
When chunks are not implemented, RPC messages in the reverse When chunks are not implemented, RPC messages in the reverse
direction are always sent using a Short message, and therefore can be direction are always sent using a Short message, and therefore can be
no larger than what can be sent inline (that is, without chunks). no larger than what can be sent inline (that is, without chunks).
skipping to change at page 10, line 36 skipping to change at page 10, line 36
If a reverse direction requester provides a non-empty chunk list to a If a reverse direction requester provides a non-empty chunk list to a
responder that does not support chunks, the responder MUST reply with responder that does not support chunks, the responder MUST reply with
an RDMA_ERROR message with rdma_err field set to ERR_CHUNK. an RDMA_ERROR message with rdma_err field set to ERR_CHUNK.
5.4. Reverse Direction Retransmission 5.4. Reverse Direction Retransmission
In rare cases, an ONC RPC service cannot complete an RPC transaction In rare cases, an ONC RPC service cannot complete an RPC transaction
and then send a reply. This can be because the transport connection and then send a reply. This can be because the transport connection
was lost, the Call or Reply message was dropped, or because the Upper was lost, the Call or Reply message was dropped, or because the Upper
Layer consumer delayed or dropped the ONC RPC request. Typically, Layer consumer delayed or dropped the ONC RPC request. Typically,
the Requester sends the transaction again, reusing the same RPC XID. the Requester sends the RPC transaction again, reusing the same RPC
This is known as an "RPC retransmission". XID. This is known as an "RPC retransmission".
In the forward direction, the Requester is the ONC RPC client. The In the forward direction, the Requester is the ONC RPC client. The
client is always responsible for establishing a transport connection client is always responsible for establishing a transport connection
before sending again. before sending again.
In the reverse direction, the Requester is the ONC RPC server. With reverse direction operation, the Requester is the ONC RPC
Because an ONC RPC server does not establish transport connections server. Because an ONC RPC server does not establish transport
with clients, it cannot send a retransmission if there is no connections with clients, it cannot retransmit if there is no
transport connection. It must wait for the ONC RPC client to re- transport connection. It is forced to wait for the ONC RPC client to
establish the transport connection before it can retransmit ONC RPC re-establish a transport connection before it can retransmit ONC RPC
transactions in the reverse direction. transactions in the reverse direction.
If an ONC RPC client has no work to do, it may be some time before it If the ONC RPC client peer has no work to do, it can be some time
re-establishes a transport connection. Reverse direction Requesters before it re-establishes a transport connection. A waiting reverse
must be prepared to wait indefinitely for a connection to be direction ONC RPC Call may time out to avoid waiting indefinitely for
established before a pending reverse direction ONC RPC Call can be a connection to be established.
retransmitted.
Forward direction Requesters are responsible for maintaining a Therefore forward direction Requesters SHOULD maintain a transport
transport connection as long as there is the possibility of reverse connection as long as there is the possibility that the connection
direction requests. For example, an NFS version 4.1 client with open peer can send reverse direction requests. For example, while an NFS
delegated files or active pNFS layouts should maintain a transport version 4.1 client has open delegated files or active pNFS layouts,
connection to enable the NFS server to perform callback operations. it maintains one or more transport connections to enable the NFS
server to perform callback operations.
6. In the Absence of Support For Reverse Direction Operation 6. In the Absence of Support For Reverse Direction Operation
An RPC-over-RDMA transport endpoint might not support reverse An RPC-over-RDMA transport endpoint might not support reverse
direction operation (and thus it does not support bi-directional direction operation (and thus it does not support bi-directional
operation). There might be no mechanism in the transport operation). There might be no mechanism in the transport
implementation to do so. Or in an implementation that can support implementation to do so. Or in an implementation that can support
operation in the reverse direction, the Upper Layer Protocol consumer operation in the reverse direction, the Upper Layer Protocol consumer
might not yet have configured or enabled the transport to handle might not yet have configured or enabled the transport to handle
reverse direction traffic. reverse direction traffic.
skipping to change at page 12, line 27 skipping to change at page 12, line 27
Specification of RPC binding parameters is usually not necessary in Specification of RPC binding parameters is usually not necessary in
this case. this case.
For bi-directional operation, other considerations may apply when For bi-directional operation, other considerations may apply when
distinct RPC Programs share an RPC-over-RDMA transport connection distinct RPC Programs share an RPC-over-RDMA transport connection
concurrently. Consult Section 6 of [I-D.ietf-nfsv4-rfc5666bis] for concurrently. Consult Section 6 of [I-D.ietf-nfsv4-rfc5666bis] for
details about what else may be contained in an Upper Layer Binding. details about what else may be contained in an Upper Layer Binding.
8. Security Considerations 8. Security Considerations
Security considerations for operation on RPC-over-RDMA transports are RPC security is handled in the RPC layer, which is above the
outlined in Section 9 of [I-D.ietf-nfsv4-rfc5666bis]. transport layer where RPC-over-RDMA operates.
Reverse direction operations make use of an authentication mechanism
and credentials that are independent of forward direction operation
but otherwise operate in the same fashion as outlined in Section 8.2
of [I-D.ietf-nfsv4-rfc5666bis].
9. IANA Considerations 9. IANA Considerations
This document does not require actions by IANA. This document does not require actions by IANA.
10. Normative References 10. Normative References
[I-D.ietf-nfsv4-rfc5666bis] [I-D.ietf-nfsv4-rfc5666bis]
Lever, C., Simpson, W., and T. Talpey, "Remote Direct Lever, C., Simpson, W., and T. Talpey, "Remote Direct
Memory Access Transport for Remote Procedure Call, Version Memory Access Transport for Remote Procedure Call, Version
skipping to change at page 13, line 22 skipping to change at page 13, line 26
him for his help and support. him for his help and support.
Dave Noveck provided excellent review, constructive suggestions, and Dave Noveck provided excellent review, constructive suggestions, and
navigational guidance throughout the process of drafting this navigational guidance throughout the process of drafting this
document. document.
Dai Ngo was a solid partner and collaborator. Together we Dai Ngo was a solid partner and collaborator. Together we
constructed and tested independent prototypes of the changes constructed and tested independent prototypes of the changes
described in this document. described in this document.
The author wishes to thank Bill Baker for his unwavering support of The author wishes to thank Bill Baker and Greg Marsden for their
this work. In addition, the author gratefully acknowledges the unwavering support of this work. In addition, the author gratefully
expert contributions of Karen Deitke, Chunli Zhang, Mahesh acknowledges the expert contributions of Karen Deitke, Chunli Zhang,
Siddheshwar, Steve Wise, and Tom Tucker. Mahesh Siddheshwar, Steve Wise, and Tom Tucker.
Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 Special thanks go to Transport Area Director Spencer Dawkins, nfsv4
Working Group Chair and document shepherd Spencer Shepler, and nfsv4 Working Group Chair and document shepherd Spencer Shepler, and nfsv4
Working Group Secretary Tom Haynes for their support. Working Group Secretary Tom Haynes for their support.
Author's Address Author's Address
Charles Lever Charles Lever
Oracle Corporation Oracle Corporation
1015 Granger Avenue 1015 Granger Avenue
 End of changes. 22 change blocks. 
56 lines changed or deleted 65 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/