draft-ietf-rddp-problem-statement-03.txt   draft-ietf-rddp-problem-statement-04.txt 
Allyn Romanow (Cisco) Allyn Romanow (Cisco)
Internet-Draft Jeff Mogul (HP) Internet-Draft Jeff Mogul (HP)
Expires: July 2004 Tom Talpey (NetApp) Expires: January 2005 Tom Talpey (NetApp)
Stephen Bailey (Sandburst) Stephen Bailey (Sandburst)
RDMA over IP Problem Statement Remote Direct Memory Access (RDMA) over IP Problem Statement
draft-ietf-rddp-problem-statement-03 draft-ietf-rddp-problem-statement-04
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with By submitting this Internet-Draft, I certify that any applicable
all provisions of Section 10 of RFC2026. patent or other IPR claims of which I am aware have been disclosed,
or will be disclosed, and any of which I become aware will be
disclosed, in accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in as reference material or to cite them other than as "work in
progress." progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt The list of
Internet-Draft Shadow Directories can be accessed at
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html
http://www.ietf.org/shadow.html.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2004). All Rights Reserved. Copyright (C) The Internet Society (2004). All Rights Reserved.
Abstract Abstract
This draft addresses an IP-based solution to the problem of high This draft addresses an IP-based solution to the problem of high
system overhead due to end-host copying of user data in the network system overhead due to the movement of user data in the network I/O
I/O path at high speeds. The problem is due to the high cost of path. The overhead has limited the use of TCP/IP in
memory bandwidth, and it can be substantially improved using "copy interconnection networks, especially where high bandwidth, low
avoidance." The overhead has limited the use of TCP/IP in
interconnection networks especially where high bandwidth, low
latency and/or low overhead of end-system data movement are latency and/or low overhead of end-system data movement are
required by the hosted application. required by the hosted application. An architectural solution
enabling "copy avoidance" is proposed to eliminate it.
Table Of Contents Table Of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2
2. The high cost of data movement operations in network I/O . 3 2. The high cost of data movement operations in network I/O . 4
2.1. Copy avoidance improves processing overhead . . . . . . . 5 2.1. Copy avoidance improves processing overhead . . . . . . . 5
3. Memory bandwidth is the root cause of the problem . . . . 6 3. Memory bandwidth is the root cause of the problem . . . . 6
4. High copy overhead is problematic for many key Internet 4. High copy overhead is problematic for many key Internet
applications . . . . . . . . . . . . . . . . . . . . . . . 7 applications . . . . . . . . . . . . . . . . . . . . . . . 7
5. Copy Avoidance Techniques . . . . . . . . . . . . . . . . 9 5. Copy Avoidance Techniques . . . . . . . . . . . . . . . . 10
5.1. A Conceptual Framework: DDP and RDMA . . . . . . . . . . . 11 5.1. A Conceptual Framework: DDP and RDMA . . . . . . . . . . . 11
6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 12 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 12
7. Security Considerations . . . . . . . . . . . . . . . . . 12 7. Security Considerations . . . . . . . . . . . . . . . . . 12
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 13 8. Terminology . . . . . . . . . . . . . . . . . . . . . . . 14
Informative References . . . . . . . . . . . . . . . . . . 13 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . 18 Informative References . . . . . . . . . . . . . . . . . . 15
Full Copyright Statement . . . . . . . . . . . . . . . . . 18 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 19
Full Copyright Statement . . . . . . . . . . . . . . . . . 20
1. Introduction 1. Introduction
This draft considers the problem of high host processing overhead This draft considers the problem of high host processing overhead
associated with the movement of user data to and from the network associated with the movement of user data to and from the network
interface under high speed conditions. This problem is often interface under high speed conditions. This problem is often
referred to as the "I/O bottleneck" [CT90]. More specifically, the referred to as the "I/O bottleneck" [CT90]. More specifically, the
source of high overhead that is of interest here is data movement source of high overhead that is of interest here is data movement
operations - copying. The throughput of a system may therefore be operations - copying. The throughput of a system may therefore be
limited by the overhead of this copying. This issue is not to be limited by the overhead of this copying. This issue is not to be
confused with TCP offload, which is not addressed here. High speed confused with TCP offload, which is not addressed here. High speed
refers to conditions where the network link speed is high relative refers to conditions where the network link speed is high relative
to the bandwidths of the host CPU and memory. With today's to the bandwidths of the host CPU and memory. With today's
computer systems, one Gbits/s and over is considered high speed. computer systems, one Gigabit per second (Gbits/s) and over is
considered high speed.
High costs associated with copying are an issue primarily for large High costs associated with copying are an issue primarily for large
scale systems. Although smaller systems such as rack-mounted PCs scale systems. Although smaller systems such as rack-mounted PCs
and small workstations would benefit from a reduction in copying and small workstations would benefit from a reduction in copying
overhead, the benefit to smaller machines will be primarily in the overhead, the benefit to smaller machines will be primarily in the
next few years as they scale in the amount of bandwidth they next few years as they scale in the amount of bandwidth they
handle. Today it is large system machines with high bandwidth handle. Today it is large system machines with high bandwidth
feeds, usually multiprocessors and clusters, that are adversely feeds, usually multiprocessors and clusters, that are adversely
affected by copying overhead. Examples of such machines include affected by copying overhead. Examples of such machines include
all varieties of servers: database servers, storage servers, all varieties of servers: database servers, storage servers,
skipping to change at page 3, line 12 skipping to change at page 3, line 14
sessions (transport connections), which, in aggregate, are sessions (transport connections), which, in aggregate, are
responsible for > 1 Gbits/s of communication. Nonetheless, the responsible for > 1 Gbits/s of communication. Nonetheless, the
cost of copying overhead for a particular load is the same whether cost of copying overhead for a particular load is the same whether
from few or many sessions. from few or many sessions.
The I/O bottleneck, and the role of data movement operations, have The I/O bottleneck, and the role of data movement operations, have
been widely studied in research and industry over the last been widely studied in research and industry over the last
approximately 14 years, and we draw freely on these results. approximately 14 years, and we draw freely on these results.
Historically, the I/O bottleneck has received attention whenever Historically, the I/O bottleneck has received attention whenever
new networking technology has substantially increased line rates - new networking technology has substantially increased line rates -
100 Mbits/s FDDI and Fast Ethernet, 155 Mbits/s ATM, 1 Gbits/s 100 Megabit per second (Mbits/s) FDDI and Fast Ethernet, 155
Ethernet. In earlier speed transitions, the availability of memory Mbits/s ATM, 1 Gbits/s Ethernet. In earlier speed transitions, the
bandwidth allowed the I/O bottleneck issue to be deferred. Now availability of memory bandwidth allowed the I/O bottleneck issue
however, this is no longer the case. While the I/O problem is to be deferred. Now however, this is no longer the case. While
significant at 1 Gbits/s, it is the introduction of 10 Gbits/s the I/O problem is significant at 1 Gbits/s, it is the introduction
Ethernet which is motivating an upsurge of activity in industry and of 10 Gbits/s Ethernet which is motivating an upsurge of activity
research [DAFS, IB, VI, CGZ01, Ma02, MAF+02]. in industry and research [DAFS, IB, VI, CGZ01, Ma02, MAF+02].
Because of high overhead of end-host processing in current Because of high overhead of end-host processing in current
implementations, the TCP/IP protocol stack is not used for high implementations, the TCP/IP protocol stack is not used for high
speed transfer. Instead, special purpose network fabrics, using a speed transfer. Instead, special purpose network fabrics, using a
technology generally known as remote direct memory access (RDMA), technology generally known as Remote Direct Memory Access (RDMA),
have been developed and are widely used. RDMA is a set of have been developed and are widely used. RDMA is a set of
mechanisms that allow the network adapter, under control of the mechanisms that allow the network adapter, under control of the
application, to steer data directly into and out of application application, to steer data directly into and out of application
buffers. Examples of such interconnection fabrics include Fibre buffers. Examples of such interconnection fabrics include Fibre
Channel [FIBRE] for block storage transfer, Virtual Interface Channel [FIBRE] for block storage transfer, Virtual Interface
Architecture [VI] for database clusters, Infiniband [IB], Compaq Architecture [VI] for database clusters, and Infiniband [IB],
Servernet [SRVNET], Quadrics [QUAD] for System Area Networks. Compaq Servernet [SRVNET] and Quadrics [QUAD] for System Area
These link level technologies limit application scaling in both Networks. These link level technologies limit application scaling
distance and size, meaning that the number of nodes cannot be in both distance and size, meaning that the number of nodes cannot
arbitrarily large. be arbitrarily large.
This problem statement substantiates the claim that in network I/O This problem statement substantiates the claim that in network I/O
processing, high overhead results from data movement operations, processing, high overhead results from data movement operations,
specifically copying; and that copy avoidance significantly specifically copying; and that copy avoidance significantly
decreases the processing overhead. It describes when and why the decreases the processing overhead. It describes when and why the
high processing overheads occur, explains why the overhead is high processing overheads occur, explains why the overhead is
problematic, and points out which applications are most affected. problematic, and points out which applications are most affected.
The document goes on to discuss why the problem is relevant to the
Internet and its applications, where high processing overheads work
to limit the available scaling of end-systems. Copy avoidance
eliminates overhead and latency for these systems, and further can
benefit effective distributed processing.
In addition, this document introduces an architectural approach to In addition, this document introduces an architectural approach to
solving the problem, which is developed in detail in [BT04]. It solving the problem, which is developed in detail in [BT04]. It
also discusses how the proposed technology may introduce security also discusses how the proposed technology may introduce security
concerns and how they should be addressed. concerns and how they should be addressed.
Finally, this document includes a Terminology section to aid as a
reference for several new terms introduced by RDMA.
2. The high cost of data movement operations in network I/O 2. The high cost of data movement operations in network I/O
A wealth of data from research and industry shows that copying is A wealth of data from research and industry shows that copying is
responsible for substantial amounts of processing overhead. It responsible for substantial amounts of processing overhead. It
further shows that even in carefully implemented systems, further shows that even in carefully implemented systems,
eliminating copies significantly reduces the overhead, as eliminating copies significantly reduces the overhead, as
referenced below. referenced below.
Clark et al. [CJRS89] in 1989 shows that TCP [Po81] overhead Clark et al. [CJRS89] in 1989 shows that TCP [Po81] overhead
processing is attributable to both operating system costs such as processing is attributable to both operating system costs such as
skipping to change at page 6, line 6 skipping to change at page 6, line 16
This is a 16% absolute reduction, a 61% relative reduction, and a This is a 16% absolute reduction, a 61% relative reduction, and a
160% relative improvement in achievable bandwidth. 160% relative improvement in achievable bandwidth.
In fact, today's network interface hardware commonly offloads the In fact, today's network interface hardware commonly offloads the
checksum, which removes the other source of per-byte overhead. checksum, which removes the other source of per-byte overhead.
They also coalesce interrupts to reduce per-packet costs. Thus, They also coalesce interrupts to reduce per-packet costs. Thus,
today copying costs account for a relatively larger part of CPU today copying costs account for a relatively larger part of CPU
utilization than previously, and therefore relatively more benefit utilization than previously, and therefore relatively more benefit
is to be gained in reducing them. (Of course this argument would is to be gained in reducing them. (Of course this argument would
be specious if the amount of overhead were insignificant, but it be specious if the amount of overhead were insignificant, but it
has been shown to be substantial.) has been shown to be substantial. [BS96, B99, Ch96, KP96, TK95])
3. Memory bandwidth is the root cause of the problem 3. Memory bandwidth is the root cause of the problem
Data movement operations are expensive because memory bandwidth is Data movement operations are expensive because memory bandwidth is
scarce relative to network bandwidth and CPU bandwidth [PAC+97]. scarce relative to network bandwidth and CPU bandwidth [PAC+97].
This trend existed in the past and is expected to continue into the This trend existed in the past and is expected to continue into the
future [HP97, STREAM], especially in large multiprocessor systems. future [HP97, STREAM], especially in large multiprocessor systems.
With copies crossing the bus twice per copy, network processing With copies crossing the bus twice per copy, network processing
overhead is high whenever network bandwidth is large in comparison overhead is high whenever network bandwidth is large in comparison
skipping to change at page 8, line 17 skipping to change at page 8, line 27
application servers that run the specific applications usually on application servers that run the specific applications usually on
more powerful machines, and the third tier is backend databases. more powerful machines, and the third tier is backend databases.
Physically, the first two tiers - web server and application server Physically, the first two tiers - web server and application server
- are usually combined [Pi01]. For example an e-commerce server - are usually combined [Pi01]. For example an e-commerce server
communicates with a database server and with a customer site, or a communicates with a database server and with a customer site, or a
content distribution server connects to a server farm, or an OLTP content distribution server connects to a server farm, or an OLTP
server connects to a database and a customer site. server connects to a database and a customer site.
When network I/O uses too much memory bandwidth, performance on When network I/O uses too much memory bandwidth, performance on
network paths between tiers can suffer. (There might also be network paths between tiers can suffer. (There might also be
performance issues on SAN paths used either by the database tier or performance issues on Storage Area Network paths used either by the
the application tier.) The high overhead from network-related database tier or the application tier.) The high overhead from
memory copies diverts system resources from other application network-related memory copies diverts system resources from other
processing. It also can create bottlenecks that limit total system application processing. It also can create bottlenecks that limit
performance. total system performance.
There are a large and growing number of these application servers There are a large and growing number of these application servers
distributed throughout the Internet. In 1999 approximately 3.4 distributed throughout the Internet. In 1999 approximately 3.4
million server units were shipped, in 2000, 3.9 million units, and million server units were shipped, in 2000, 3.9 million units, and
the estimated annual growth rate for 2000-2004 was 17 percent the estimated annual growth rate for 2000-2004 was 17 percent
[Ne00, Pa01]. [Ne00, Pa01].
There is high motivation to maximize the processing capacity of There is high motivation to maximize the processing capacity of
each CPU, as scaling by adding CPUs one way or another has each CPU, as scaling by adding CPUs one way or another has
drawbacks. For example, adding CPUs to a multiprocessor will not drawbacks. For example, adding CPUs to a multiprocessor will not
skipping to change at page 9, line 31 skipping to change at page 9, line 41
the server, the performance of 3 servers was compared. One server the server, the performance of 3 servers was compared. One server
was Apache, another an optimized server called Flash, and the third was Apache, another an optimized server called Flash, and the third
the Flash server running IO-Lite, called Flash-Lite with zero copy. the Flash server running IO-Lite, called Flash-Lite with zero copy.
The measurement was of throughput in requests/second as a function The measurement was of throughput in requests/second as a function
of the number of slow background clients that could be served. As of the number of slow background clients that could be served. As
the table shows, Flash-Lite has better throughput, especially as the table shows, Flash-Lite has better throughput, especially as
the number of clients increases. the number of clients increases.
Apache Flash Flash-Lite Apache Flash Flash-Lite
------ ----- ---------- ------ ----- ----------
#Clients Thruput reqs/s Thruput Thruput #Clients Throughput reqs/s Throughput Throughput
0 520 610 890 0 520 610 890
16 390 490 890 16 390 490 890
32 360 490 850 32 360 490 850
64 360 490 890 64 360 490 890
128 310 450 880 128 310 450 880
256 310 440 820 256 310 440 820
Traditional Web servers (which mostly send data and can keep most Traditional Web servers (which mostly send data and can keep most
of their content in the file cache) are not the worst case for copy of their content in the file cache) are not the worst case for copy
overhead. Web proxies (which often receive as much data as they overhead. Web proxies (which often receive as much data as they
send) and complex Web servers based on SANs or multi-tier systems send) and complex Web servers based on System Area Networks or
will suffer more from copy overheads than in the example above. multi-tier systems will suffer more from copy overheads than in the
example above.
5. Copy Avoidance Techniques 5. Copy Avoidance Techniques
There have been extensive research investigation and industry There have been extensive research investigation and industry
experience with two main alternative approaches to eliminating data experience with two main alternative approaches to eliminating data
movement overhead, often along with improving other Operating movement overhead, often along with improving other Operating
System processing costs. In one approach, hardware and/or software System processing costs. In one approach, hardware and/or software
changes within a single host reduce processing costs. In another changes within a single host reduce processing costs. In another
approach, memory-to-memory networking [MAF+02], the exchange of approach, memory-to-memory networking [MAF+02], the exchange of
explicit data placement information between hosts allows them to explicit data placement information between hosts allows them to
skipping to change at page 10, line 23 skipping to change at page 10, line 34
using a networking protocol to exchange information, the network using a networking protocol to exchange information, the network
adapter, under control of the application, places data directly adapter, under control of the application, places data directly
into and out of application buffers, reducing the need for data into and out of application buffers, reducing the need for data
movement. Commonly this approach is called RDMA, Remote Direct movement. Commonly this approach is called RDMA, Remote Direct
Memory Access. Memory Access.
As discussed below, research and industry experience has shown that As discussed below, research and industry experience has shown that
copy avoidance techniques within the receiver processing path alone copy avoidance techniques within the receiver processing path alone
have proven to be problematic. The research special purpose host have proven to be problematic. The research special purpose host
adapter systems had good performance and can be seen as precursors adapter systems had good performance and can be seen as precursors
for the commercial RDMA-based NICs [KSZ95, DWB+93]. In software, for the commercial RDMA-based adapters [KSZ95, DWB+93]. In
many implementations have successfully achieved zero-copy transmit, software, many implementations have successfully achieved zero-copy
but few have accomplished zero-copy receive. And those that have transmit, but few have accomplished zero-copy receive. And those
done so make strict alignment and no-touch requirements on the that have done so make strict alignment and no-touch requirements
application, greatly reducing the portability and usefulness of the on the application, greatly reducing the portability and usefulness
implementation. of the implementation.
In contrast, experience has proven satisfactory with memory-to- In contrast, experience has proven satisfactory with memory-to-
memory systems that permit RDMA - performance has been good and memory systems that permit RDMA - performance has been good and
there have not been system or networking difficulties. RDMA is a there have not been system or networking difficulties. RDMA is a
single solution. Once implemented, it can be used with any OS and single solution. Once implemented, it can be used with any OS and
machine architecture, and it does not need to be revised when machine architecture, and it does not need to be revised when
either of these changes. either of these changes.
In early work, one goal of the software approaches was to show that In early work, one goal of the software approaches was to show that
TCP could go faster with appropriate OS support [CJR89, CFF+94]. TCP could go faster with appropriate OS support [CJR89, CFF+94].
skipping to change at page 12, line 37 skipping to change at page 12, line 48
interconnects and the deployment of these hosts on Internet interconnects and the deployment of these hosts on Internet
Protocol-based networks leads to the desireability to layer such a Protocol-based networks leads to the desireability to layer such a
solution on the Internet Protocol Suite. The architecture solution on the Internet Protocol Suite. The architecture
described in [BT04] is such a proposal. described in [BT04] is such a proposal.
7. Security Considerations 7. Security Considerations
Solutions to the problem of reducing copying overhead in high Solutions to the problem of reducing copying overhead in high
bandwidth transfers via one or more protocols may introduce new bandwidth transfers via one or more protocols may introduce new
security concerns. Any proposed solution must be analyzed for security concerns. Any proposed solution must be analyzed for
security threats and any such threats addressed. Potential security vulnerabilities and any such vulnerabilities addressed.
security weaknesses due to resource issues that might lead to Potential security weaknesses due to resource issues that might
denial-of-service attacks, overwrites and other concurrent lead to denial-of-service attacks, overwrites and other concurrent
operations, the ordering of completions as required by the RDMA operations, the ordering of completions as required by the RDMA
protocol, the granularity of transfer, and any other identified protocol, the granularity of transfer, and any other identified
threats; need to be examined, described and an adequate solution to vulnerabilities; need to be examined, described and an adequate
them found. resolution to them found.
Layered atop Internet transport protocols, the RDMA protocols will Layered atop Internet transport protocols, the RDMA protocols will
gain leverage from and must permit integration with Internet gain leverage from and must permit integration with Internet
security standards, such as IPSec and TLS [IPSEC, TLS]. A thorough security standards, such as IPsec and TLS [IPSEC, TLS]. However,
analysis of the degree to which these protocols address potential there may be implementation ramifications for certain security
threats is required. approaches with respect to RDMA, due to its copy avoidance.
IPsec, operating to secure the connection on a packet-by-packet
basis, seems to be a natural fit to securing RDMA placement, which
operates in conjunction with transport. Because RDMA enables an
implementation to avoid buffering, it is preferable to perform all
applicable security protection prior to processing each transport
and RDMA layer segment. Such a layering enables the most efficient
secure RDMA implementation.
The TLS record protocol, on the other hand, is layered on top of
reliable transports and cannot provide such security assurance
until an entire record is available, which may require the
buffering and/or assembly of several distinct messages prior to TLS
processing. This defers RDMA processing and introduces overheads
that RDMA is designed to avoid. TLS therefore is viewed as
potentially a less natural fit for protecting the RDMA protocols.
A thorough analysis of the degree to which security protocols
address potential threats via RDMA is required.
Security for an RDMA design requires more than just securing the Security for an RDMA design requires more than just securing the
communication channel. While it is necessary to be able to communication channel. While it is necessary to be able to
guarantee channel properties such as privacy, integrity, and guarantee channel properties such as confidentiality, integrity,
authentication, these properties cannot defend against all attacks and authentication, these properties cannot defend against all
from properly authenticated peers, which might be malicious, attacks from properly authenticated peers, which might be
compromised, or buggy. For example, an RDMA peer should not be malicious, compromised, or buggy. For example, an RDMA peer should
able to read or write memory regions without prior consent. not be able to read or write memory regions without prior consent.
Further, it must not be possible to evade consistency checks at the Further, it must not be possible to evade consistency checks at the
recipient. The RDMA design must allow the recipient to rely on its recipient. The RDMA design must allow the recipient to rely on its
consistent memory contents by controlling peer access to memory consistent memory contents by controlling peer access to memory
regions explicitly, and must disallow peer access to regions when regions explicitly. Peers which do not pass authentication and
not authorized. authorization checks must not be permitted to connect to an
inappropriate endpoint. Peer accesses must be authenticated and
made subject to authorization checks prior to any operation to a
memory region.
The RDMA protocols must ensure that regions addressable by RDMA The RDMA protocols must ensure that regions addressable by RDMA
peers be under strict application control. Remote access to local peers be under strict application control. Remote access to local
memory by a network peer introduces a number of potential security memory by a network peer introduces a number of potential security
concerns. This becomes particularly important in the Internet concerns. This becomes particularly important in the Internet
context, where such access can be exported globally. context, where such access can be exported globally.
The RDMA protocols carry in part what is essentially user The RDMA protocols carry in part what is essentially user
information, explicitly including addressing information and information, explicitly including addressing information and
operation type (read or write), and implicitly including protection operation type (read or write), and implicitly including protection
skipping to change at page 13, line 37 skipping to change at page 14, line 22
higher level aspects in addition to the basic formation of higher level aspects in addition to the basic formation of
messages. The semantics associated with each class of error must messages. The semantics associated with each class of error must
be clearly defined, and the expected action to be taken on mismatch be clearly defined, and the expected action to be taken on mismatch
be specified. In some cases, this will result in a catastrophic be specified. In some cases, this will result in a catastrophic
error on the RDMA association, however in others a local or remote error on the RDMA association, however in others a local or remote
error may be signalled. Certain of these errors may require error may be signalled. Certain of these errors may require
consideration of abstract local semantics, which must be carefully consideration of abstract local semantics, which must be carefully
specified so as to provide useful behavior while not constraining specified so as to provide useful behavior while not constraining
the implementation. the implementation.
8. Acknowledgements 8. Terminology
This section contains general terminology definitions for this
document and for Remote Direct Memory Access in general.
Remote Direct Memory Access (RDMA)
A method of accessing memory on a remote system in which the
local system specifies the location of the data to be
transferred.
RDMA Protocol
A protocol that supports RDMA Operations to transfer data
between systems.
Fabric
The collection of links, switches, and routers that connect a
set of systems.
Storage Area Network (SAN)
A network where disks, tapes and other storage devices are
made available to one or more end-systems via a fabric.
System Area Network
A network where clustered systems share services, such as
storage and interprocess communication, via a fabric.
Fibre Channel (FC)
An ANSI standard link layer with associated protocols,
typically used to implement Storage Area Networks. [FIBRE]
Virtual Interface Architecture (VI, VIA)
An RDMA interface definition developed by an industry group
and implemented with a variety of differing wire protocols.
[VI]
Infiniband (IB)
An RDMA interface, protocol suite and link layer specification
defined by an industry trade association. [IB]
9. Acknowledgements
Jeff Chase generously provided many useful insights and Jeff Chase generously provided many useful insights and
information. Thanks to Jim Pinkerton for many helpful discussions. information. Thanks to Jim Pinkerton for many helpful discussions.
9. Informative References 10. Informative References
[BCF+95] [BCF+95]
N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L.
Seitz, J. N. Seizovic, and W. Su. "Myrinet - A gigabit-per- Seitz, J. N. Seizovic, and W. Su. "Myrinet - A gigabit-per-
second local-area network", IEEE Micro, February 1995 second local-area network", IEEE Micro, February 1995
[BJM+96] [BJM+96]
G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, J. Wilkes, G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, J. Wilkes,
"An implementation of the Hamlyn send-managed interface "An implementation of the Hamlyn send-managed interface
architecture", in Proceedings of the Second Symposium on architecture", in Proceedings of the Second Symposium on
skipping to change at page 14, line 23 skipping to change at page 15, line 47
[Br99] [Br99]
J. C. Brustoloni, "Interoperation of copy avoidance in network J. C. Brustoloni, "Interoperation of copy avoidance in network
and file I/O", Proceedings of IEEE Infocom, 1999, pp. 534-542 and file I/O", Proceedings of IEEE Infocom, 1999, pp. 534-542
[BS96] [BS96]
J. C. Brustoloni, P. Steenkiste, "Effects of buffering J. C. Brustoloni, P. Steenkiste, "Effects of buffering
semantics on I/O performance", Proceedings OSDI'96, USENIX, semantics on I/O performance", Proceedings OSDI'96, USENIX,
Seattle, WA October 1996, pp. 277-291 Seattle, WA October 1996, pp. 277-291
RFC Editor note:
Replace following architecture draft-ietf- name, status and date
with appropriate reference when assigned.
[BT04] [BT04]
S. Bailey, T. Talpey, "The Architecture of Direct Data S. Bailey, T. Talpey, "The Architecture of Direct Data
Placement (DDP) And Remote Direct Memory Access (RDMA) On Placement (DDP) And Remote Direct Memory Access (RDMA) On
Internet Protocols", Internet Draft Work in Progress, draft- Internet Protocols", Internet Draft Work in Progress, draft-
ietf-rddp-arch-04, January 2004 ietf-rddp-arch-05, July 2004
[CFF+94] [CFF+94]
C-H Chang, D. Flower, J. Forecast, H. Gray, B. Hawe, A. C-H Chang, D. Flower, J. Forecast, H. Gray, B. Hawe, A.
Nadkarni, K. K. Ramakrishnan, U. Shikarpur, K. Wilde, "High- Nadkarni, K. K. Ramakrishnan, U. Shikarpur, K. Wilde, "High-
performance TCP/IP and UDP/IP networking in DEC OSF/1 for performance TCP/IP and UDP/IP networking in DEC OSF/1 for
Alpha AXP", Proceedings of the 3rd IEEE Symposium on High Alpha AXP", Proceedings of the 3rd IEEE Symposium on High
Performance Distributed Computing, August 1994, pp. 36-42 Performance Distributed Computing, August 1994, pp. 36-42
[CGY01] [CGY01]
J. S. Chase, A. J. Gallatin, and K. G. Yocum, "End system J. S. Chase, A. J. Gallatin, and K. G. Yocum, "End system
skipping to change at page 18, line 44 skipping to change at page 20, line 22
Tom Talpey Tom Talpey
Network Appliance Network Appliance
375 Totten Pond Road 375 Totten Pond Road
Waltham, MA 02451 USA Waltham, MA 02451 USA
Phone: +1 781 768 5329 Phone: +1 781 768 5329
Email: thomas.talpey@netapp.com Email: thomas.talpey@netapp.com
Full Copyright Statement Full Copyright Statement
Copyright (C) The Internet Society (2004). All Rights Reserved. Copyright (C) The Internet Society (2004). This document is
subject to the rights, licenses and restrictions contained in BCP
78 and except as set forth therein, the authors retain all their
rights.
This document and translations of it may be copied and furnished to This document and the information contained herein are provided on
others, and derivative works that comment on or otherwise explain an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
it or assist in its implementation may be prepared, copied, REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
published and distributed, in whole or in part, without restriction THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
of any kind, provided that the above copyright notice and this EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
paragraph are included on all such copies and derivative works. THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
However, this document itself may not be modified in any way, such ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
as by removing the copyright notice or references to the Internet PARTICULAR PURPOSE.
Society or other Internet organizations, except as needed for the
purpose of developing Internet standards in which case the
procedures for copyrights defined in the Internet Standards process
must be followed, or as required to translate it into languages
other than English.
The limited permissions granted above are perpetual and will not be Intellectual Property
revoked by the Internet Society or its successors or assigns. The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology described
in this document or the extent to which any license under such
rights might or might not be available; nor does it represent that
it has made any independent effort to identify any such rights.
Information on the procedures with respect to rights in RFC
documents can be found in BCP 78 and BCP 79.
This document and the information contained herein is provided on Copies of IPR disclosures made to the IETF Secretariat and any
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET assurances of licenses to be made available, or the result of an
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR attempt made to obtain a general license or permission for the use
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF of such proprietary rights by implementers or users of this
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED specification can be obtained from the IETF on-line IPR repository
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/