draft-ietf-rddp-problem-statement-01.txt   draft-ietf-rddp-problem-statement-02.txt 
Allyn Romanow (Cisco) Allyn Romanow (Cisco)
Internet-Draft Jeff Mogul (HP) Internet-Draft Jeff Mogul (HP)
Expires: August 2003 Tom Talpey (NetApp) Expires: December 2003 Tom Talpey (NetApp)
Stephen Bailey (Sandburst) Stephen Bailey (Sandburst)
RDMA over IP Problem Statement RDMA over IP Problem Statement
draft-ietf-rddp-problem-statement-01.txt draft-ietf-rddp-problem-statement-02
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 43 skipping to change at page 1, line 43
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved. Copyright (C) The Internet Society (2003). All Rights Reserved.
Abstract Abstract
This draft addresses an IP-based solution to the problem of high This draft addresses an IP-based solution to the problem of high
system costs due to network I/O copying in end-hosts at high system costs due to network I/O copying in end-hosts at high
speeds. The problem is due to the high cost of memory bandwidth, speeds. The problem is due to the high cost of memory bandwidth,
and it can be substantially improved using "copy avoidance." The and it can be substantially improved using "copy avoidance." The
high overhead has prevented TCP/IP from being used as an high overhead has limited the use of TCP/IP in interconnection
interconnection network. networks especially where high bandwidth, low latency and/or low
overhead of end-system data movement are required by the hosted
application.
Table Of Contents Table Of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2
2. The high cost of data movement operations in network I/O . 3 2. The high cost of data movement operations in network I/O . 3
2.1. Copy avoidance improves processing overhead . . . . . . . 5 2.1. Copy avoidance improves processing overhead . . . . . . . 5
3. Memory bandwidth is the root cause of the problem . . . . 6 3. Memory bandwidth is the root cause of the problem . . . . 6
4. High copy overhead is problematic for many key Internet 4. High copy overhead is problematic for many key Internet
applications . . . . . . . . . . . . . . . . . . . . . . . 7 applications . . . . . . . . . . . . . . . . . . . . . . . 7
5. Copy Avoidance Techniques . . . . . . . . . . . . . . . . 9 5. Copy Avoidance Techniques . . . . . . . . . . . . . . . . 9
5.1. A Conceptual Framework: DDP and RDMA . . . . . . . . . . . 11 5.1. A Conceptual Framework: DDP and RDMA . . . . . . . . . . . 11
6. Security Considerations . . . . . . . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . 11
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 12 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 12
References . . . . . . . . . . . . . . . . . . . . . . . . 12 Informative References . . . . . . . . . . . . . . . . . . 12
Authors' Addresses . . . . . . . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 17
Full Copyright Statement . . . . . . . . . . . . . . . . . 17 Full Copyright Statement . . . . . . . . . . . . . . . . . 18
1. Introduction 1. Introduction
This draft considers the problem of high host processing overhead This draft considers the problem of high host processing overhead
associated with network I/O that occurs under high speed associated with network I/O that occurs under high speed
conditions. This problem is often referred to as the "I/O conditions. This problem is often referred to as the "I/O
bottleneck" [CT90]. More specifically, the source of high overhead bottleneck" [CT90]. More specifically, the source of high overhead
that is of interest here is data movement operations - copying. that is of interest here is data movement operations - copying.
This issue is not be confused with TCP offload, which is not This issue is not be confused with TCP offload, which is not
addressed here. High speed refers to conditions where the network addressed here. High speed refers to conditions where the network
skipping to change at page 8, line 19 skipping to change at page 8, line 19
performance issues on SAN paths used either by the database tier or performance issues on SAN paths used either by the database tier or
the application tier.) The high overhead from network-related the application tier.) The high overhead from network-related
memory copies diverts system resources from other application memory copies diverts system resources from other application
processing. It also can create bottlenecks that limit total system processing. It also can create bottlenecks that limit total system
performance. performance.
There are a large and growing number of these application servers There are a large and growing number of these application servers
distributed throughout the Internet. In 1999 approximately 3.4 distributed throughout the Internet. In 1999 approximately 3.4
million server units were shipped, in 2000, 3.9 million units, and million server units were shipped, in 2000, 3.9 million units, and
the estimated annual growth rate for 2000-2004 was 17 percent the estimated annual growth rate for 2000-2004 was 17 percent
[Ne00, PA01]. [Ne00, Pa01].
There is high motivation to maximize the processing capacity of There is high motivation to maximize the processing capacity of
each CPU, as scaling by adding CPUs one way or another has each CPU, as scaling by adding CPUs one way or another has
drawbacks. For example, adding CPUs to a multiprocessor will not drawbacks. For example, adding CPUs to a multiprocessor will not
necessarily help, as a multiprocessor improves performance only necessarily help, as a multiprocessor improves performance only
when the memory bus has additional bandwidth to spare. Clustering when the memory bus has additional bandwidth to spare. Clustering
can add additional complexity to handling the applications. can add additional complexity to handling the applications.
In order to scale a cluster or multiprocessor system, one must In order to scale a cluster or multiprocessor system, one must
proportionately scale the interconnect bandwidth. Interconnect proportionately scale the interconnect bandwidth. Interconnect
skipping to change at page 11, line 42 skipping to change at page 11, line 42
control aspect. control aspect.
[BT02] develops an architecture for DDP and RDMA, and is a [BT02] develops an architecture for DDP and RDMA, and is a
companion draft to this problem statement. companion draft to this problem statement.
6. Security Considerations 6. Security Considerations
Solutions to the problem of reducing copying overhead in high Solutions to the problem of reducing copying overhead in high
bandwidth transfers via one or more protocols may introduce new bandwidth transfers via one or more protocols may introduce new
security concerns. Any proposed solution must be analyzed for security concerns. Any proposed solution must be analyzed for
security threats and any such threats addressed. [BSW02] brings up security threats and any such threats addressed. Potential
potential security weaknesses due to resource issues that might security weaknesses due to resource issues that might lead to
lead to denial-of-service attacks, overwrites and other concurrent denial-of-service attacks, overwrites and other concurrent
operations, the ordering of completions as required by the RDMA operations, the ordering of completions as required by the RDMA
protocol, and the granularity of transfer. Each of these concerns protocol, the granularity of transfer, and any other identified
plus any other identified threats need to be examined, described threats; need to be examined, described and an adequate solution to
and an adequate solution to them found. them found.
Layered atop Internet transport protocols, the RDMA protocols will Layered atop Internet transport protocols, the RDMA protocols will
gain leverage from and must permit integration with Internet gain leverage from and must permit integration with Internet
security standards, such as IPSec and TLS [IPSEC, TLS]. A thorough security standards, such as IPSec and TLS [IPSEC, TLS]. A thorough
analysis of the degree to which these protocols solve threats is analysis of the degree to which these protocols address potential
required. threats is required.
Security for an RDMA design requires more than just securing the Security for an RDMA design requires more than just securing the
communication channel. While it is necessary to be able to communication channel. While it is necessary to be able to
guarantee channel properties such as privacy, integrity, and guarantee channel properties such as privacy, integrity, and
authentication, these properties cannot defend against all attacks authentication, these properties cannot defend against all attacks
from properly authenticated peers, which might be malicious, from properly authenticated peers, which might be malicious,
compromised, or buggy. For example, an RDMA peer should not be compromised, or buggy. For example, an RDMA peer should not be
able to read or write memory regions without prior consent. able to read or write memory regions without prior consent.
Further, it must not be possible to evade consistency checks at the Further, it must not be possible to evade consistency checks at the
recipient. For example, the RDMA design should not allow a peer to recipient. The RDMA design must allow the recipient to rely on its
update a region after the completion of an authorized update. consistent memory contents by controlling peer access to memory
regions explicitly, and must disallow peer access to regions when
not authorized.
The RDMA protocols must ensure that regions addressable by RDMA The RDMA protocols must ensure that regions addressable by RDMA
peers be under strict application control. Remote access to local peers be under strict application control. Remote access to local
memory by a network peer introduces a number of potential security memory by a network peer introduces a number of potential security
concerns. This becomes particularly important in the Internet concerns. This becomes particularly important in the Internet
context, where such access can be exported globally. context, where such access can be exported globally.
The RDMA protocols carry in part what is essentially user The RDMA protocols carry in part what is essentially user
information, explicitly including addressing information and information, explicitly including addressing information and
operation type (read or write), and implicitly including protection operation type (read or write), and implicitly including protection
skipping to change at page 12, line 45 skipping to change at page 12, line 47
error may be signalled. Certain of these errors may require error may be signalled. Certain of these errors may require
consideration of abstract local semantics, which must be carefully consideration of abstract local semantics, which must be carefully
specified so as to provide useful behavior while not constraining specified so as to provide useful behavior while not constraining
the implementation. the implementation.
7. Acknowledgements 7. Acknowledgements
Jeff Chase generously provided many useful insights and Jeff Chase generously provided many useful insights and
information. Thanks to Jim Pinkerton for many helpful discussions. information. Thanks to Jim Pinkerton for many helpful discussions.
8. References 8. Informative References
[BCF+95] [BCF+95]
N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L.
Seitz, J. N. Seizovic, and W. Su. "Myrinet - A gigabit-per- Seitz, J. N. Seizovic, and W. Su. "Myrinet - A gigabit-per-
second local-area network", IEEE Micro, February 1995 second local-area network", IEEE Micro, February 1995
[BJM+96] [BJM+96]
G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, J. Wilkes, G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, J. Wilkes,
"An implementation of the Hamlyn send-managed interface "An implementation of the Hamlyn send-managed interface
architecture", in Proceedings of the Second Symposium on architecture", in Proceedings of the Second Symposium on
Operating Systems Design and Implementation, USENIX Assoc., Operating Systems Design and Implementation, USENIX Assoc.,
October 1996 October 1996
[BLA+94] [BLA+94]
M. A. Blumrich, K. Li, R. Alpert, C. Dubnicki, E. W. Felten, M. A. Blumrich, K. Li, R. Alpert, C. Dubnicki, E. W. Felten,
"A virtual memory mapped network interface for the SHRIMP "A virtual memory mapped network interface for the SHRIMP
skipping to change at page 13, line 26 skipping to change at page 13, line 28
[Br99] [Br99]
J. C. Brustoloni, "Interoperation of copy avoidance in network J. C. Brustoloni, "Interoperation of copy avoidance in network
and file I/O", Proceedings of IEEE Infocom, 1999, pp. 534-542 and file I/O", Proceedings of IEEE Infocom, 1999, pp. 534-542
[BS96] [BS96]
J. C. Brustoloni, P. Steenkiste, "Effects of buffering J. C. Brustoloni, P. Steenkiste, "Effects of buffering
semantics on I/O performance", Proceedings OSDI'96, USENIX, semantics on I/O performance", Proceedings OSDI'96, USENIX,
Seattle, WA October 1996, pp. 277-291 Seattle, WA October 1996, pp. 277-291
[BSW02] RFC Editor note:
D. Black, M. Speer, J. Wroclawski, "DDP and RDMA Concerns", Replace following architecture draft-ietf- name, status and date
http://www.ietf.org/internet-drafts/draft-ietf-rddp-rdma- with appropriate reference when assigned.
concerns-00.txt, Work in Progress, December 2002
[BT02] [BT02]
S. Bailey, T. Talpey, "The Architecture of Direct Data S. Bailey, T. Talpey, "The Architecture of Direct Data
Placement (DDP) And Remote Direct Memory Access (RDMA) On Placement (DDP) And Remote Direct Memory Access (RDMA) On
Internet Protocols", Work in Progress, Internet Protocols", Internet Draft Work in Progress, draft-
http://www.ietf.org/internet-drafts/draft-ietf-rddp- ietf-rddp-arch-02, June 2003
arch-01.txt, February 2003
[CFF+94] [CFF+94]
C-H Chang, D. Flower, J. Forecast, H. Gray, B. Hawe, A. C-H Chang, D. Flower, J. Forecast, H. Gray, B. Hawe, A.
Nadkarni, K. K. Ramakrishnan, U. Shikarpur, K. Wilde, "High- Nadkarni, K. K. Ramakrishnan, U. Shikarpur, K. Wilde, "High-
performance TCP/IP and UDP/IP networking in DEC OSF/1 for performance TCP/IP and UDP/IP networking in DEC OSF/1 for
Alpha AXP", Proceedings of the 3rd IEEE Symposium on High Alpha AXP", Proceedings of the 3rd IEEE Symposium on High
Performance Distributed Computing, August 1994, pp. 36-42 Performance Distributed Computing, August 1994, pp. 36-42
[CGY01] [CGY01]
J. S. Chase, A. J. Gallatin, and K. G. Yocum, "End system J. S. Chase, A. J. Gallatin, and K. G. Yocum, "End system
skipping to change at page 14, line 22 skipping to change at page 14, line 22
D. D. Clark, V. Jacobson, J. Romkey, H. Salwen, "An analysis D. D. Clark, V. Jacobson, J. Romkey, H. Salwen, "An analysis
of TCP processing overhead", IEEE Communications Magazine, of TCP processing overhead", IEEE Communications Magazine,
volume: 27, Issue: 6, June 1989, pp 23-29 volume: 27, Issue: 6, June 1989, pp 23-29
[CT90] [CT90]
D. D. Clark, D. Tennenhouse, "Architectural considerations for D. D. Clark, D. Tennenhouse, "Architectural considerations for
a new generation of protocols", Proceedings of the ACM SIGCOMM a new generation of protocols", Proceedings of the ACM SIGCOMM
Conference, 1990 Conference, 1990
[DAFS] [DAFS]
Direct Access File System http://www.dafscollaborative.org DAFS Collaborative, "Direct Access File System Specification
http://www.ietf.org/internet-drafts/draft-wittle-dafs-00.txt v1.0", September 2001, available from
http://www.dafscollaborative.org
[DAPP93] [DAPP93]
P. Druschel, M. B. Abbott, M. A. Pagels, L. L. Peterson, P. Druschel, M. B. Abbott, M. A. Pagels, L. L. Peterson,
"Network subsystem design", IEEE Network, July 1993, pp. 8-17 "Network subsystem design", IEEE Network, July 1993, pp. 8-17
[DP93] [DP93]
P. Druschel, L. L. Peterson, "Fbufs: a high-bandwidth cross- P. Druschel, L. L. Peterson, "Fbufs: a high-bandwidth cross-
domain transfer facility", Proceedings of the 14th ACM domain transfer facility", Proceedings of the 14th ACM
Symposium of Operating Systems Principles, December 1993 Symposium of Operating Systems Principles, December 1993
skipping to change at page 15, line 5 skipping to change at page 15, line 5
user-level network interface for parallel and distributed user-level network interface for parallel and distributed
computing", Proc. of the 15th ACM Symposium on Operating computing", Proc. of the 15th ACM Symposium on Operating
Systems Principles, Copper Mountain, Colorado, December 3-6, Systems Principles, Copper Mountain, Colorado, December 3-6,
1995 1995
[FGM+99] [FGM+99]
R. Fielding, J. Gettys, J. Mogul, F. Frystyk, L. Masinter, P. R. Fielding, J. Gettys, J. Mogul, F. Frystyk, L. Masinter, P.
Leach, T. Berners-Lee, "Hypertext Transfer Protocol - Leach, T. Berners-Lee, "Hypertext Transfer Protocol -
HTTP/1.1", RFC 2616, June 1999 HTTP/1.1", RFC 2616, June 1999
[FIBRE] [FIBRE]
Fibre Channel Standard ANSI Technical Committee T10, "Fibre Channel Protocol (FCP)"
http://www.fibrechannel.com/technology/index.master.html (and as revised and updated), ANSI X3.269:1996 [R2001],
committee draft available from
http://www.t10.org/drafts.htm#FibreChannel
[HP97] [HP97]
J. L. Hennessy, D. A. Patterson, Computer Organization and J. L. Hennessy, D. A. Patterson, Computer Organization and
Design, 2nd Edition, San Francisco: Morgan Kaufmann Design, 2nd Edition, San Francisco: Morgan Kaufmann
Publishers, 1997 Publishers, 1997
[IB] InfiniBand Architecture Specification, Volumes 1 and 2, [IB] InfiniBand Trade Association, "InfiniBand Architecture
Release 1.0.a. http://www.infinibandta.org Specification, Volumes 1 and 2", Release 1.1, November 2002,
available from http://www.infinibandta.org/specs
[KP96] [KP96]
J. Kay, J. Pasquale, "Profiling and reducing processing J. Kay, J. Pasquale, "Profiling and reducing processing
overheads in TCP/IP", IEEE/ACM Transactions on Networking, Vol overheads in TCP/IP", IEEE/ACM Transactions on Networking, Vol
4, No. 6, pp.817-828, December 1996 4, No. 6, pp.817-828, December 1996
[KSZ95] [KSZ95]
K. Kleinpaste, P. Steenkiste, B. Zill, "Software support for K. Kleinpaste, P. Steenkiste, B. Zill, "Software support for
outboard buffering and checksumming", SIGCOMM'95 outboard buffering and checksumming", SIGCOMM'95
skipping to change at page 15, line 43 skipping to change at page 15, line 46
Chase, D. Gallatin, R. Kisley, R. Wickremesinghe, E. Gabber, Chase, D. Gallatin, R. Kisley, R. Wickremesinghe, E. Gabber,
"Structure and Performance of the Direct Access File System "Structure and Performance of the Direct Access File System
(DAFS)", accepted for publication at the 2002 USENIX Annual (DAFS)", accepted for publication at the 2002 USENIX Annual
Technical Conference, Monterey, CA, June 9-14, 2002. Technical Conference, Monterey, CA, June 9-14, 2002.
[Mc95] [Mc95]
J. D. McCalpin, "A Survey of memory bandwidth and machine J. D. McCalpin, "A Survey of memory bandwidth and machine
balance in current high performance computers", IEEE TCCA balance in current high performance computers", IEEE TCCA
Newsletter, December 1995 Newsletter, December 1995
[MYR]
Myrinet, http://www.myricom.com
[Ne00] [Ne00]
A. Newman, "IDC report paints conflicted picture of server A. Newman, "IDC report paints conflicted picture of server
market circa 2004", ServerWatch, July 24, 2000 market circa 2004", ServerWatch, July 24, 2000
http://serverwatch.internet.com/news/2000_07_24_a.html http://serverwatch.internet.com/news/2000_07_24_a.html
[Pa01] [Pa01]
M. Pastore, "Server shipments for 2000 surpass those in 1999", M. Pastore, "Server shipments for 2000 surpass those in 1999",
ServerWatch, February 7, 2001 ServerWatch, February 7, 2001
http://serverwatch.internet.com/news/2001_02_07_a.html http://serverwatch.internet.com/news/2001_02_07_a.html
[PAC+97] [PAC+97]
skipping to change at page 16, line 21 skipping to change at page 16, line 21
C. Kozyrakis, R. Thomas, K. Yelick , "A case for intelligient C. Kozyrakis, R. Thomas, K. Yelick , "A case for intelligient
RAM: IRAM", IEEE Micro, April 1997 RAM: IRAM", IEEE Micro, April 1997
[PDZ99] [PDZ99]
V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a unified I/O V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a unified I/O
buffering and caching system", Proc. of the 3rd Symposium on buffering and caching system", Proc. of the 3rd Symposium on
Operating Systems Design and Implementation, New Orleans, LA, Operating Systems Design and Implementation, New Orleans, LA,
February 1999 February 1999
[Pi01] [Pi01]
J. Pinkerton, "Winsock Direct: the value of System Area J. Pinkerton, "Winsock Direct: The Value of System Area
Networks". http://www.microsoft.com/windows2000/techinfo/ Networks", May 2001, available from
http://www.microsoft.com/windows2000/techinfo/
howitworks/communications/winsock.asp howitworks/communications/winsock.asp
[Po81] [Po81]
J. Postel, "Transmission Control Protocol - DARPA Internet J. Postel, "Transmission Control Protocol - DARPA Internet
Program Protocol Specification", RFC 793, September 1981 Program Protocol Specification", RFC 793, September 1981
[QUAD] [QUAD]
Quadrics Ltd., http://www.quadrics.com Quadrics Ltd., Quadrics QSNet product information, available
from http://www.quadrics.com/website/pages/02qsn.html
[SDP] [SDP]
Sockets Direct Protocol v1.0 InfiniBand Trade Association, "Sockets Direct Protocol v1.0",
Annex A of InfiniBand Architecture Specification Volume 1,
Release 1.1, November 2002, available from
http://www.infinibandta.org/specs
[SRVNET] [SRVNET]
Compaq Servernet, R. Horst, "TNet: A reliable system area network", IEEE Micro,
http://nonstop.compaq.com/view.asp?PAGE=ServerNet pp. 37-45, February 1995
[STREAM] [STREAM]
The STREAM Benchmark Reference Information, J. D. McAlpin, The STREAM Benchmark Reference Information,
http://www.cs.virginia.edu/stream/ http://www.cs.virginia.edu/stream/
[TK95] [TK95]
M. N. Thadani, Y. A. Khalidi, "An efficient zero-copy I/O M. N. Thadani, Y. A. Khalidi, "An efficient zero-copy I/O
framework for UNIX", Technical Report, SMLI TR-95-39, May 1995 framework for UNIX", Technical Report, SMLI TR-95-39, May 1995
[VI] Compaq Computer Corp., Intel Corporation and Microsoft
[VI] Virtual Interface Architecture Specification Version 1.0. Corporation, "Virtual Interface Architecture Specification
Version 1.0", December 1997, available from
http://www.vidf.org/info/04standards.html http://www.vidf.org/info/04standards.html
[Wa97] [Wa97]
J. R. Walsh, "DART: Fast application-level networking via J. R. Walsh, "DART: Fast application-level networking via
data-copy avoidance", IEEE Network, July/August 1997, pp. data-copy avoidance", IEEE Network, July/August 1997, pp.
28-38 28-38
Authors' Addresses Authors' Addresses
Stephen Bailey Stephen Bailey
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/