draft-ietf-rddp-problem-statement-04.txt   draft-ietf-rddp-problem-statement-05.txt 
Allyn Romanow (Cisco) Internet-Draft Allyn Romanow (Cisco)
Internet-Draft Jeff Mogul (HP) Expires: April 2005 Jeff Mogul (HP)
Expires: January 2005 Tom Talpey (NetApp) Tom Talpey (NetApp)
Stephen Bailey (Sandburst) Stephen Bailey (Sandburst)
Remote Direct Memory Access (RDMA) over IP Problem Statement Remote Direct Memory Access (RDMA) over IP Problem Statement
draft-ietf-rddp-problem-statement-04 draft-ietf-rddp-problem-statement-05
Status of this Memo Status of this Memo
By submitting this Internet-Draft, I certify that any applicable By submitting this Internet-Draft, I certify that any applicable
patent or other IPR claims of which I am aware have been disclosed, patent or other IPR claims of which I am aware have been disclosed,
or will be disclosed, and any of which I become aware will be or will be disclosed, and any of which I become aware will be
disclosed, in accordance with RFC 3668. disclosed, in accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in as reference material or to cite them other than as "work in
progress." progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt The list of http://www.ietf.org/ietf/1id-abstracts.txt
Internet-Draft Shadow Directories can be accessed at
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2004). All Rights Reserved. Copyright (C) The Internet Society (2004). All Rights Reserved.
Abstract Abstract
This draft addresses an IP-based solution to the problem of high Overhead due to the movement of user data in the end-system network
system overhead due to the movement of user data in the network I/O I/O processing path at high speeds is significant, and has limited
path. The overhead has limited the use of TCP/IP in the use of Internet protocols in interconnection networks and the
interconnection networks, especially where high bandwidth, low Internet itself - especially where high bandwidth, low latency
latency and/or low overhead of end-system data movement are and/or low overhead are required by the hosted application.
required by the hosted application. An architectural solution
enabling "copy avoidance" is proposed to eliminate it. This draft examines this overhead, and addresses an architectural,
IP-based "copy avoidance" solution for its elimination, by enabling
Remote Direct Memory Access (RDMA).
Table Of Contents Table Of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2
2. The high cost of data movement operations in network I/O . 4 2. The high cost of data movement operations in network I/O . 4
2.1. Copy avoidance improves processing overhead . . . . . . . 5 2.1. Copy avoidance improves processing overhead . . . . . . . 5
3. Memory bandwidth is the root cause of the problem . . . . 6 3. Memory bandwidth is the root cause of the problem . . . . 6
4. High copy overhead is problematic for many key Internet 4. High copy overhead is problematic for many key Internet
applications . . . . . . . . . . . . . . . . . . . . . . . 7 applications . . . . . . . . . . . . . . . . . . . . . . . 7
5. Copy Avoidance Techniques . . . . . . . . . . . . . . . . 10 5. Copy Avoidance Techniques . . . . . . . . . . . . . . . . 10
5.1. A Conceptual Framework: DDP and RDMA . . . . . . . . . . . 11 5.1. A Conceptual Framework: DDP and RDMA . . . . . . . . . . . 12
6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 12 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 12
7. Security Considerations . . . . . . . . . . . . . . . . . 12 7. Security Considerations . . . . . . . . . . . . . . . . . 13
8. Terminology . . . . . . . . . . . . . . . . . . . . . . . 14 8. Terminology . . . . . . . . . . . . . . . . . . . . . . . 14
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 15 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 15
Informative References . . . . . . . . . . . . . . . . . . 15 Informative References . . . . . . . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . 19 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 19
Full Copyright Statement . . . . . . . . . . . . . . . . . 20 Full Copyright Statement . . . . . . . . . . . . . . . . . 20
1. Introduction 1. Introduction
This draft considers the problem of high host processing overhead This draft considers the problem of high host processing overhead
associated with the movement of user data to and from the network associated with the movement of user data to and from the network
skipping to change at page 3, line 14 skipping to change at page 3, line 14
sessions (transport connections), which, in aggregate, are sessions (transport connections), which, in aggregate, are
responsible for > 1 Gbits/s of communication. Nonetheless, the responsible for > 1 Gbits/s of communication. Nonetheless, the
cost of copying overhead for a particular load is the same whether cost of copying overhead for a particular load is the same whether
from few or many sessions. from few or many sessions.
The I/O bottleneck, and the role of data movement operations, have The I/O bottleneck, and the role of data movement operations, have
been widely studied in research and industry over the last been widely studied in research and industry over the last
approximately 14 years, and we draw freely on these results. approximately 14 years, and we draw freely on these results.
Historically, the I/O bottleneck has received attention whenever Historically, the I/O bottleneck has received attention whenever
new networking technology has substantially increased line rates - new networking technology has substantially increased line rates -
100 Megabit per second (Mbits/s) FDDI and Fast Ethernet, 155 100 Megabit per second (Mbits/s) Fast Ethernet and Fibre
Mbits/s ATM, 1 Gbits/s Ethernet. In earlier speed transitions, the Distributed Data Interface [FDDI], 155 Mbits/s Asynchronous
availability of memory bandwidth allowed the I/O bottleneck issue Transfer Mode [ATM], 1 Gbits/s Ethernet. In earlier speed
to be deferred. Now however, this is no longer the case. While transitions, the availability of memory bandwidth allowed the I/O
the I/O problem is significant at 1 Gbits/s, it is the introduction bottleneck issue to be deferred. Now however, this is no longer
of 10 Gbits/s Ethernet which is motivating an upsurge of activity the case. While the I/O problem is significant at 1 Gbits/s, it is
in industry and research [DAFS, IB, VI, CGZ01, Ma02, MAF+02]. the introduction of 10 Gbits/s Ethernet which is motivating an
upsurge of activity in industry and research [DAFS, IB, VI, CGZ01,
Ma02, MAF+02].
Because of high overhead of end-host processing in current Because of high overhead of end-host processing in current
implementations, the TCP/IP protocol stack is not used for high implementations, the TCP/IP protocol stack is not used for high
speed transfer. Instead, special purpose network fabrics, using a speed transfer. Instead, special purpose network fabrics, using a
technology generally known as Remote Direct Memory Access (RDMA), technology generally known as Remote Direct Memory Access (RDMA),
have been developed and are widely used. RDMA is a set of have been developed and are widely used. RDMA is a set of
mechanisms that allow the network adapter, under control of the mechanisms that allow the network adapter, under control of the
application, to steer data directly into and out of application application, to steer data directly into and out of application
buffers. Examples of such interconnection fabrics include Fibre buffers. Examples of such interconnection fabrics include Fibre
Channel [FIBRE] for block storage transfer, Virtual Interface Channel [FIBRE] for block storage transfer, Virtual Interface
Architecture [VI] for database clusters, and Infiniband [IB], Architecture [VI] for database clusters, and Infiniband [IB],
Compaq Servernet [SRVNET] and Quadrics [QUAD] for System Area Compaq Servernet [SRVNET] and Quadrics [QUAD] for System Area
Networks. These link level technologies limit application scaling Networks. These link level technologies limit application scaling
in both distance and size, meaning that the number of nodes cannot in both distance and size, meaning that the number of nodes cannot
be arbitrarily large. be arbitrarily large.
This problem statement substantiates the claim that in network I/O This problem statement substantiates the claim that in network I/O
processing, high overhead results from data movement operations, processing, high overhead results from data movement operations,
specifically copying; and that copy avoidance significantly specifically copying; and that copy avoidance significantly
decreases the processing overhead. It describes when and why the decreases this processing overhead. It describes when and why the
high processing overheads occur, explains why the overhead is high processing overheads occur, explains why the overhead is
problematic, and points out which applications are most affected. problematic, and points out which applications are most affected.
The document goes on to discuss why the problem is relevant to the The document goes on to discuss why the problem is relevant to the
Internet and its applications, where high processing overheads work Internet and to Internet-based applications. Applications which
to limit the available scaling of end-systems. Copy avoidance store, manage and distribute the information of the Internet are
eliminates overhead and latency for these systems, and further can well suited to applying the copy avoidance solution. They will
benefit by avoiding high processing overheads, which removes limits
to the available scaling of tiered end-systems. Copy avoidance
also eliminates latency for these systems, which can further
benefit effective distributed processing. benefit effective distributed processing.
In addition, this document introduces an architectural approach to In addition, this document introduces an architectural approach to
solving the problem, which is developed in detail in [BT04]. It solving the problem, which is developed in detail in [BT04]. It
also discusses how the proposed technology may introduce security also discusses how the proposed technology may introduce security
concerns and how they should be addressed. concerns and how they should be addressed.
Finally, this document includes a Terminology section to aid as a Finally, this document includes a Terminology section to aid as a
reference for several new terms introduced by RDMA. reference for several new terms introduced by RDMA.
skipping to change at page 12, line 46 skipping to change at page 13, line 11
An architectural solution to alleviate these bottlenecks best An architectural solution to alleviate these bottlenecks best
satisifies the issue. Further, the high speed of today's satisifies the issue. Further, the high speed of today's
interconnects and the deployment of these hosts on Internet interconnects and the deployment of these hosts on Internet
Protocol-based networks leads to the desireability to layer such a Protocol-based networks leads to the desireability to layer such a
solution on the Internet Protocol Suite. The architecture solution on the Internet Protocol Suite. The architecture
described in [BT04] is such a proposal. described in [BT04] is such a proposal.
7. Security Considerations 7. Security Considerations
Solutions to the problem of reducing copying overhead in high Solutions to the problem of reducing copying overhead in high
bandwidth transfers via one or more protocols may introduce new bandwidth transfers may introduce new security concerns. Any
security concerns. Any proposed solution must be analyzed for proposed solution must be analyzed for security vulnerabilities and
security vulnerabilities and any such vulnerabilities addressed. any such vulnerabilities addressed. Potential security weaknesses
Potential security weaknesses due to resource issues that might due to resource issues that might lead to denial-of-service
lead to denial-of-service attacks, overwrites and other concurrent attacks, overwrites and other concurrent operations, the ordering
operations, the ordering of completions as required by the RDMA of completions as required by the RDMA protocol, the granularity of
protocol, the granularity of transfer, and any other identified transfer, and any other identified vulnerabilities; need to be
vulnerabilities; need to be examined, described and an adequate examined, described and an adequate resolution to them found.
resolution to them found.
Layered atop Internet transport protocols, the RDMA protocols will Layered atop Internet transport protocols, the RDMA protocols will
gain leverage from and must permit integration with Internet gain leverage from and must permit integration with Internet
security standards, such as IPsec and TLS [IPSEC, TLS]. However, security standards, such as IPsec and TLS [IPSEC, TLS]. However,
there may be implementation ramifications for certain security there may be implementation ramifications for certain security
approaches with respect to RDMA, due to its copy avoidance. approaches with respect to RDMA, due to its copy avoidance.
IPsec, operating to secure the connection on a packet-by-packet IPsec, operating to secure the connection on a packet-by-packet
basis, seems to be a natural fit to securing RDMA placement, which basis, seems to be a natural fit to securing RDMA placement, which
operates in conjunction with transport. Because RDMA enables an operates in conjunction with transport. Because RDMA enables an
implementation to avoid buffering, it is preferable to perform all implementation to avoid buffering, it is preferable to perform all
applicable security protection prior to processing each transport applicable security protection prior to processing of each segment
and RDMA layer segment. Such a layering enables the most efficient by the transport and RDMA layers. Such a layering enables the most
secure RDMA implementation. efficient secure RDMA implementation.
The TLS record protocol, on the other hand, is layered on top of The TLS record protocol, on the other hand, is layered on top of
reliable transports and cannot provide such security assurance reliable transports and cannot provide such security assurance
until an entire record is available, which may require the until an entire record is available, which may require the
buffering and/or assembly of several distinct messages prior to TLS buffering and/or assembly of several distinct messages prior to TLS
processing. This defers RDMA processing and introduces overheads processing. This defers RDMA processing and introduces overheads
that RDMA is designed to avoid. TLS therefore is viewed as that RDMA is designed to avoid. TLS therefore is viewed as
potentially a less natural fit for protecting the RDMA protocols. potentially a less natural fit for protecting the RDMA protocols.
A thorough analysis of the degree to which security protocols It is necessary to guarantee properties such as confidentiality,
address potential threats via RDMA is required. integrity, and authentication on an RDMA communications channel.
However, these properties cannot defend against all attacks from
Security for an RDMA design requires more than just securing the properly authenticated peers, which might be malicious,
communication channel. While it is necessary to be able to compromised, or buggy. Therefore the RDMA design must address
guarantee channel properties such as confidentiality, integrity, protection against such attacks. For example, an RDMA peer should
and authentication, these properties cannot defend against all
attacks from properly authenticated peers, which might be
malicious, compromised, or buggy. For example, an RDMA peer should
not be able to read or write memory regions without prior consent. not be able to read or write memory regions without prior consent.
Further, it must not be possible to evade consistency checks at the Further, it must not be possible to evade memory consistency checks
recipient. The RDMA design must allow the recipient to rely on its at the recipient. The RDMA design must allow the recipient to rely
consistent memory contents by controlling peer access to memory on its consistent memory contents by explicitly controlling peer
regions explicitly. Peers which do not pass authentication and access to memory regions at appropriate times.
authorization checks must not be permitted to connect to an
inappropriate endpoint. Peer accesses must be authenticated and
made subject to authorization checks prior to any operation to a
memory region.
The RDMA protocols must ensure that regions addressable by RDMA Peer connections which do not pass authentication and authorization
peers be under strict application control. Remote access to local checks by upper layers must not be permitted to begin processing in
memory by a network peer introduces a number of potential security RDMA mode with an inappropriate endpoint. Once associated, peer
concerns. This becomes particularly important in the Internet accesses to memory regions must be authenticated and made subject
context, where such access can be exported globally. to authorization checks in the context of the association and
connection on which they are to be performed, prior to any transfer
operation or data being accessed.
The RDMA protocols carry in part what is essentially user The RDMA protocols must ensure that these region protections be
information, explicitly including addressing information and under strict application control. Remote access to local memory by
operation type (read or write), and implicitly including protection a network peer is particularly important in the Internet context,
and attributes. As such, the protocol requires checking of these where such access can be exported globally.
higher level aspects in addition to the basic formation of
messages. The semantics associated with each class of error must
be clearly defined, and the expected action to be taken on mismatch
be specified. In some cases, this will result in a catastrophic
error on the RDMA association, however in others a local or remote
error may be signalled. Certain of these errors may require
consideration of abstract local semantics, which must be carefully
specified so as to provide useful behavior while not constraining
the implementation.
8. Terminology 8. Terminology
This section contains general terminology definitions for this This section contains general terminology definitions for this
document and for Remote Direct Memory Access in general. document and for Remote Direct Memory Access in general.
Remote Direct Memory Access (RDMA) Remote Direct Memory Access (RDMA)
A method of accessing memory on a remote system in which the A method of accessing memory on a remote system in which the
local system specifies the location of the data to be local system specifies the location of the data to be
transferred. transferred.
skipping to change at page 15, line 20 skipping to change at page 15, line 18
An RDMA interface, protocol suite and link layer specification An RDMA interface, protocol suite and link layer specification
defined by an industry trade association. [IB] defined by an industry trade association. [IB]
9. Acknowledgements 9. Acknowledgements
Jeff Chase generously provided many useful insights and Jeff Chase generously provided many useful insights and
information. Thanks to Jim Pinkerton for many helpful discussions. information. Thanks to Jim Pinkerton for many helpful discussions.
10. Informative References 10. Informative References
[ATM]
The ATM Forum, "Asynchronous Transfer Mode Physical Layer
Specification" af-phy-0015.000, etc. drafts available from
http://www.atmforum.com/standards/approved.html
[BCF+95] [BCF+95]
N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L.
Seitz, J. N. Seizovic, and W. Su. "Myrinet - A gigabit-per- Seitz, J. N. Seizovic, and W. Su. "Myrinet - A gigabit-per-
second local-area network", IEEE Micro, February 1995 second local-area network", IEEE Micro, February 1995
[BJM+96] [BJM+96]
G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, J. Wilkes, G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, J. Wilkes,
"An implementation of the Hamlyn send-managed interface "An implementation of the Hamlyn send-managed interface
architecture", in Proceedings of the Second Symposium on architecture", in Proceedings of the Second Symposium on
Operating Systems Design and Implementation, USENIX Assoc., Operating Systems Design and Implementation, USENIX Assoc.,
skipping to change at page 15, line 46 skipping to change at page 16, line 4
Computer Architecture, April 1994, pp. 142-153 Computer Architecture, April 1994, pp. 142-153
[Br99] [Br99]
J. C. Brustoloni, "Interoperation of copy avoidance in network J. C. Brustoloni, "Interoperation of copy avoidance in network
and file I/O", Proceedings of IEEE Infocom, 1999, pp. 534-542 and file I/O", Proceedings of IEEE Infocom, 1999, pp. 534-542
[BS96] [BS96]
J. C. Brustoloni, P. Steenkiste, "Effects of buffering J. C. Brustoloni, P. Steenkiste, "Effects of buffering
semantics on I/O performance", Proceedings OSDI'96, USENIX, semantics on I/O performance", Proceedings OSDI'96, USENIX,
Seattle, WA October 1996, pp. 277-291 Seattle, WA October 1996, pp. 277-291
[BT04] [BT04]
S. Bailey, T. Talpey, "The Architecture of Direct Data S. Bailey, T. Talpey, "The Architecture of Direct Data
Placement (DDP) And Remote Direct Memory Access (RDMA) On Placement (DDP) And Remote Direct Memory Access (RDMA) On
Internet Protocols", Internet Draft Work in Progress, draft- Internet Protocols", Internet Draft Work in Progress, draft-
ietf-rddp-arch-05, July 2004 ietf-rddp-arch-06, October 2004
[CFF+94] [CFF+94]
C-H Chang, D. Flower, J. Forecast, H. Gray, B. Hawe, A. C-H Chang, D. Flower, J. Forecast, H. Gray, B. Hawe, A.
Nadkarni, K. K. Ramakrishnan, U. Shikarpur, K. Wilde, "High- Nadkarni, K. K. Ramakrishnan, U. Shikarpur, K. Wilde, "High-
performance TCP/IP and UDP/IP networking in DEC OSF/1 for performance TCP/IP and UDP/IP networking in DEC OSF/1 for
Alpha AXP", Proceedings of the 3rd IEEE Symposium on High Alpha AXP", Proceedings of the 3rd IEEE Symposium on High
Performance Distributed Computing, August 1994, pp. 36-42 Performance Distributed Computing, August 1994, pp. 36-42
[CGY01] [CGY01]
J. S. Chase, A. J. Gallatin, and K. G. Yocum, "End system J. S. Chase, A. J. Gallatin, and K. G. Yocum, "End system
skipping to change at page 17, line 16 skipping to change at page 17, line 19
performance protocols", Technical Report, HP Laboratories performance protocols", Technical Report, HP Laboratories
Bristol, HPL-93-46, July 1993 Bristol, HPL-93-46, July 1993
[EBBV95] [EBBV95]
T. von Eicken, A. Basu, V. Buch, and W. Vogels, "U-Net: A T. von Eicken, A. Basu, V. Buch, and W. Vogels, "U-Net: A
user-level network interface for parallel and distributed user-level network interface for parallel and distributed
computing", Proc. of the 15th ACM Symposium on Operating computing", Proc. of the 15th ACM Symposium on Operating
Systems Principles, Copper Mountain, Colorado, December 3-6, Systems Principles, Copper Mountain, Colorado, December 3-6,
1995 1995
[FDDI]
International Standards Organization, "Fibre Distributed Data
Interface", ISO/IEC 9314, committee drafts available from
http://www.iso.org
[FGM+99] [FGM+99]
R. Fielding, J. Gettys, J. Mogul, F. Frystyk, L. Masinter, P. R. Fielding, J. Gettys, J. Mogul, F. Frystyk, L. Masinter, P.
Leach, T. Berners-Lee, "Hypertext Transfer Protocol - Leach, T. Berners-Lee, "Hypertext Transfer Protocol -
HTTP/1.1", RFC 2616, June 1999 HTTP/1.1", RFC 2616, June 1999
[FIBRE] [FIBRE]
ANSI Technical Committee T10, "Fibre Channel Protocol (FCP)" ANSI Technical Committee T10, "Fibre Channel Protocol (FCP)"
(and as revised and updated), ANSI X3.269:1996 [R2001], (and as revised and updated), ANSI X3.269:1996 [R2001],
committee draft available from committee draft available from
http://www.t10.org/drafts.htm#FibreChannel http://www.t10.org/drafts.htm#FibreChannel
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/