[Docs] [txt|pdf] [Tracker] [WG] [Email] [Nits]
Versions: 00 01 02 03 04 05 06 07 08 09 10 11
12 13 14 15 RFC 3819
Internet Engineering Task Force Phil Karn
INTERNET DRAFT Aaron Falk
Joe Touch
Marie-Jose Montpetit
File: draft-ietf-pilc-link-design-00.txt June, 1999
Expires: December, 1999
Advice for Internet Subnetwork Designers
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet- Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document provides advice to the designers of digital
communication equipment, link layer protocols and packet switched
subnetworks (collectively referred to as subnetworks) who wish to
support the Internet protocols but who may be unfamiliar with the
architecture of the Internet and the implications of their design
choices on the performance and efficiency of the Internet.
This document represents an evolving consensus of the members of the
IETF Performance Implications of Link Characteristics (PILC) working
group.
Introduction and Overview
The Internet Protocol [RFC791] is the core protocol of the
world-wide Internet that defines a simple "connectionless"
packet-switched network. The success of the Internet is largely
attributed to the simplicity of IP, the "end-to-end principle" on
which the Internet is based, and the resulting ease of carrying IP
on a wide variety of subnetworks not necessarily designed with IP in
mind.
But while many subnetworks carry IP, they do not necessarily do so
with maximum efficiency, minimum complexity or minimum cost. Nor do
they implement certain features to efficiently support newer
Internet features of increasing importance, such as multicasting or
quality of service.
With the explosive growth of the Internet, IP is an increasingly
large fraction of the traffic carried by the world's
telecommunications networks. It therefore makes sense to optimize
both existing and new subnetwork technologies for IP as much as
possible.
Optimizing a subnetwork for IP involves three complementary
considerations:
1. Providing functionality sufficient to carry IP.
2. Eliminating unnecessary functions that increase cost or
complexity.
3. Choosing subnetwork parameters that maximize the performance of
the Internet protocols.
Because IP is so simple, consideration 2 is more of an issue than
consideration 1. I.e., subnetwork designers make many more errors of
commission than errors of omission. But certain enhanced Internet
features, such as multicasting and quality-of-service, rely on
support from the underlying subnetworks beyond that necessary to
carry "traditional" unicast, best-effort IP.
A major consideration in the efficient design of any layered
communication network are the appropriate layer(s) in which to
implement a given feature. This issue was first addressed in the
seminal paper "End-to-End Arguments in System Design" [SRC81]. This
paper argued that many -- if not most -- network functions are best
implemented on an end-to-end basis, i.e., at the higher protocol
layers. Duplicating these functions at the lower levels is at best
redundant, and can even be harmful. The architecture of the Internet
was heavily influenced by this philosophy, and in our view it was
crucial to the Internet's success.
The remainder of this document discusses the various subnetwork
design issues that the authors consider relevant to efficient IP
support.
Maximum Transmission Units (MTUs) and IP Fragmentation
IP packets (datagrams) vary in size from 20 bytes (the size of the
IP header alone) to a maximum of 65535 bytes. Subnetworks need not
support maximum-sized (64KB) IP packets, as IP provides a scheme
that breaks packets that are too large for a given subnetwork into
fragments that travel as independent packets and are reassembled at
the destination. The maximum packet size supported by a subnetwork
is known as its Maximum Transmission Unit (MTU).
Subnetworks may, but are not required to indicate the lengths of the
packets they carry. One example is Ethernet with the DIX (not IEEE
802.3) header, which lacks a length field to indicate the true data
length when the packet is padded to the 60 byte minimum. This is
not a problem for IP because it carries its own length field.
In IP version 4 (current IP), fragmentation can occur at either the
sending host or in an intermediate router, and fragments can be
further fragmented at subsequent routers if necessary.
In IP version 6, fragmentation can occur only at the sending host;
it cannot occur in a router.
Both IPv4 and IPv6 provide a "Path MTU Discovery" procedure
[RFC????] that allows the sending host to avoid fragmentation by
discovering the minimum MTU along a given path and reducing its
packet sizes accordingly. This procedure is optional in IPv4 but
mandatory in IPv6 where there is no router fragmentation.
The Path MTU Discovery procedure (and the deletion of router
fragmentation in IPv6) reflects a consensus of the Internet
technical community that IP fragmentation is best avoided. This
requires that subnetworks support MTUs that are "reasonably"
large. The smallest MTU that IPv4 can use is 28 bytes, but this is
clearly unreasonable.
If a subnetwork cannot directly support a "reasonable" MTU with
native framing mechanisms, it should internally fragment. That is,
it should transparently break IP packets into internal data elements
and reassemble them at the other end of the subnetwork.
This leaves the question of what is a "reasonable" MTU. Ethernet
(10 and 100 Mb/s) has a MTU of 1500 bytes, and because of its
ubiquity few Internet paths have MTUs larger than this value. This
severely limits the utility of larger MTUs provided by other
subnetworks. But larger MTUs are increasingly desirable on high
speed subnetworks to reduce the per-packet processing overhead in
host computers, and implementers are encouraged to provide them even
though they may not be usable when Ethernet is also in the path.
[add specific advice for MTUs on slow and fast networks -- make MTU
a function of speed?]
Framing on Connection-Oriented Subnetworks
IP needs a way to mark the beginning and end of each
variable-length, asynchronous IP packet. While connectionless
subnetworks generally provide this feature, many connection-oriented
subnetworks do not. Some examples include:
1. leased lines carrying a synchronous bit stream;
2. ISDN B-channels carrying a synchronous octet stream;
3. dialup telephone modems carrying an asynchronous octet stream;
and
4. Asynchronous Transfer Mode (ATM) networks carrying an
asynchronous stream of fixed-sized "cells"
The Internet community has defined packet framing methods for all
these subnetworks. The Point-To-Point Protocol (PPP) [] is
applicable to bit synchronous, octet synchronous and octet
asynchronous links (i.e., examples 1-3 above). ATM has its own
framing method described in [RFC1577].
Because these framing methods are usually implemented partly or
wholly in software, performance may suffer at higher speeds. At
progressively lower speeds, a cell-, octet- or bit-oriented
interface to a connection-oriented subnetwork may be acceptable.
The definition of "low speed" depends on the nature of the hardware
interface and the processing capacity available to implement the
necessary framing method in software.
At high speeds, a subnetwork should provide a framed interface
capable of carrying asynchronous, variable-length IP datagrams. The
maximum packet size supported by this interface is discussed above
in the MTU/Fragmentation section. The subnetwork may implement this
facility in any convenient manner. In particular, IP packet
boundaries need not coincide with any framing or synchronization
mechanisms internal to the subnetwork.
[comments about common packet sizes and internal ATM wastage]
Connection-Oriented Subnetworks
IP has no notion of a "connection"; it is a purely connectionless
protocol. When a connection is required by an application, it is
usually provided by TCP, the Transmission Control Protocol, running
atop IP on an end-to-end basis.
Connection-oriented subnetworks can be (and are) widely used to
carry IP, but often with considerable complexity. Subnetworks with
a few nodes can simply open a permanent connection between each pair
of nodes, as is frequently done with ATM. But the number of
connections is equal to the square of the number of nodes, so this
is clearly impractical for large subnetworks. A "shim" layer between
IP and the subnetwork is therefore required to manage connections in
the latter.
These shim layers typically open subnetwork connections as needed
when an IP packet is queued for transmission and close them after an
idle timeout. There is no relation between subnetwork connections
and any connections that may exist at higher layers (e.g., TCP).
Because Internet traffic is typically bursty and
transaction-oriented, it is often difficult to pick an optimal idle
timeout. If the timeout is too short, subnetwork connections are
opened and closed rapidly, possibly over-stressing its call
management system (especially if was designed for voice traffic
holding times). If the timeout is too long, subnetwork connections
are idle much of the time, wasting any resources dedicated to them
by the subnetwork.
The ideal subnetwork for IP is connectionless. Connection-oriented
networks that dedicate minimal resources to each connection (e.g.,
ATM) are a distant second, and connection-oriented networks that
dedicate a fixed amount of bandwidth to each connection (e.g., the
PSTN, including ISDN) are the least efficient. If such subnetworks
must be used to carry IP, their call-processing systems should be
capable of rapid call set-up and tear-down.
Bandwidth on Demand (BoD) Subnets (Aaron Falk)
Wireless networks, including both satellite and terrestrial, may use
Bandwidth on Demand (BoD). Bandwidth on demand, which is implemented at
the link layer by Demand Assignment Multiple Access (DAMA) in TDMA
systems, is currently one of the proposed mechanism to efficiently share
limited spectrum resources amongst a large number of users.
The design parameters for BoD are similar to those in connection
oriented subnetworks, however the implementations may be very
different. In BoD, the user typically requests access to the shared
channel for some duration. Access may be allocated in terms of a
period of time at a specific rate, a certain number of packets, or
until the user chooses to release the channel. Access may be
coordinated through a central management entity or through using a
distributed algorithm amongst the users. The resource shared may be
a terrestrial wireless hop, a satellite uplink, or an end-to-end
satellite channel.
Long delay BoD subnets pose problems similar to the Connection
Oriented networks in terms of anticipating traffic arrivals. While
connection oriented subnets hold idle channels open expecting new
data to arrive, BoD subnets request channel access based on buffer
occupancy (or expected buffer occupancy) on the sending port. Poor
performance will likely result if the sender does not anticipate
additional traffic arriving at that port during the time it takes to
grant a transmission request. It is recommended that the algorithm
have the capability to extend a hold on the channel for data that
has arrived after the original request was generated (this may done
by piggybacking new requests on user data).
There are a wide variety of BoD protocols available and there has
been relatively little comprehensive research on the interactions
between the BoD mechanisms and Internet protocol performance. A
tradeoff exists balancing the time a user can be allowed to hold a
channel to drain port buffers with the additional imposed latency on
other users who are forced to wait to get access to the channel. It
is desireable to design mechanisms that constrain the BoD imposed
latency variation. This will be helpful in preventing spurious
timeouts from TCP.
Reliability and Error Control
In the Internet architecture, the ultimate responsibility for error
recovery is at the end points. The Internet may occasionally drop,
corrupt, duplicate or reorder packets, and the transport protocol
(e.g., TCP) or application (e.g., if UDP is used) must recover from
these errors on an end-to-end basis. Error recovery in the
subnetwork is therefore justified only to the extent that it can
enhance overall performance.
Internet transport protocols usually cannot distinguish between
packet loss due to congestion and packet loss due to a subnet or
link error (e.g.,
; it is the responsibility of the end-to-end protocol (e.g., TCP) or
the application (if UDP is used) to detect and recover from these
events. Excessive subnetwork is therefore a performance issue, not a
reliability
The ultimate responsibility for errr
[true reliability can only be provided on an end-to-end basis;
subnet reliability can be sometimes justified as a performance
enhancement. Transport protocols must avoid congestion, which
implies lousy performance on links with high random error rates due
to noise. Subnet reliability should be "lightweight", i.e., it only
has to be "good enough", *not* perfect. "good enough" means less
than one end-to-end error per round trip time; transport protocol
performance decreases dramatically when this rate is exceeded. FEC
is best implemented in the subnet. interleaving delays < RTT
acceptable]
Quality of Service, Fairness vs Performance, Congestion signalling
[subnet hooks for QOS bits]
Delay Characteristics
[self clocking TCP, (re)transmission shaping]
Bandwidth Asymmetries
Some subnetworks may provide asymmetric bandwidth and the Internet
protocol suite will generally still work fine. However, there is a
case when such a scenario reduces TCP performance. Since TCP data
segments are ``clocked'' out by returning acknowledgments TCP
senders are limited by the rate at which ACKs can be returned
[BPK98]. Therefore, when the ratio of the bandwidth of the channel
carrying the data to the bandwidth of the channel carrying the
acknowledgments (ACKs) is too large, the slow return of the ACKs
directly impacts performance. Since ACKs are generally smaller than
data segments, TCP can tolerate some asymmetry.
One way to cope with asymmetric subnetworks is to increase the size
of the data segments as much as possible. This allows more data to
be sent per ACK, and therefore mitigates the slow flow of ACKs.
Using the delayed acknowledgment mechanism {Bra89], which reduces
the number of ACKs transmitted by the receiver by roughly half, can
also improve performance by reducing the congestion on the ACK
channel.
Several other coping strategies exist (ack filtering, ack congestion
control, etc.).
Buffering, flow & congestion control
[atm dropping individual cells in a packet means the entire packet
must be dropped]
Compression
[Best done end-to-end. The required processing is more available
there, and the benefits are realized by more network elements. If
compression is provided in a subnetwork, it *must* detect
incompressible data and "get out of the way", i.e., not make the
compressed data larger in an attempt to compress it further, and it
must not degrade throughput. Another consideration: even when the
user data is compressible, subnetwork compression effectiveness is
sometimes limited by the speed of the interface to the subnetwork.]
Packet Reordering
The Internet architecture does not guarantee that packets will
arrive in the same order in which they were originally transmitted.
However, we recommend that subnetworks should attempt to
gratuitously re-order segments. Since TCP returns a cumulative
acknowledgment (ACK) indicating the last in-order segment that has
arrived, out-of-order segments cause a TCP receiver to transmit a
duplicate acknowledgment. When the TCP sender notices three
duplicate acknowledgments it assumes that a segment was dropped by
the network and uses the fast retransmit algorithm [Jac90,APS99] to
resend the segment. In addition, the congestion window is reduced
by half, effectivly halving TCP's sending rate. If a subnetwork
badly re-orders segments such that three duplicate ACKs are
generated the TCP sender needlessly reduces the congestion window,
and therefore performance.
Mobility
[best provided at a higher layer, for performance and flexibility
reasons, but some subnet mobility can be a convenience as long as
it's not too inefficient with routing]
Multicasting
Similar to the case of broadcast and discovery, multicast is more
efficient on shared links where it is supported natively. Native
multicast support requires a reasonable number (?? - over 10, under
1000?) of separate link-layer broadcast addresses. One such address
SHOULD be reserved for native link broadcast; other addresses
SHOULD be provided support separate multicast groups (and there SHOULD
be at least 10?? such addresses).
The other criteria for native multicast is a link-layer filter, which
can select individual or sets of broadcast addresses. Such link
filters avoid having every host parse every multicast message in the
driver; a host receives, at the network layer, only those packets that
pass its configured link filters. A shared link SHOULD support
multiple, programmable link filters, to support efficient native
multicast.
[Multicasting can be simulated over unicast subnets by sending
multiple copies of packets, but this is wasteful. If the subnet can
support native multicasting in an efficient way, it should do so]
Broadcasting and Discovery
Link layers fall into two categories: point-to-point and shared
link. A point-to-point link has exactly two endpoint components
(hosts or gateways); a shared link has more than two, either on
an inherently broadcast media (e.g., ethernet, radio) or on a
switching layer hidden from the network layer (switched ethernet,
Myrinet, ATM).
There are a number of Internet protocols which make use of link
layer broadcast capabilities. These include link layer address
lookup (ARP), auto-configuration (RARP, BOOTP, DHCP), and routing
(RIP). These protocols require broadcast-capable links. Shared
links SHOULD support native, link layer subnet broadcast.
The lack of broadcast can impede the performance of these protocols,
or in some cases render them inoperable. ARP-like link address
lookup can be provided by a centralized database, rather than
owner response to broadcast queries. This comes at the expense
of potentially higher response latency and the need for explicit
knowledge of the ARP server address (no automatic ARP discovery).
For other protocols, if a link does not support broadcast, the
protocol is inoperable. This is the case for DHCP, for example.
Routing
[what is proper division between routing at the Internet layer and
routing in the subnet? Is it useful or helpful to Internet routing
to have subnetworks that provide their own internal routing?]
Security
[Security mechanisms should be placed as close as possible to the
entities that they protect. E.g., mechanisms that protect host
computers or users should be implemented at the higher layers and
operate on an end-to-end basis under control of the users. This
makes subnet security mechanisms largely redundant unless they are
to protect the subnet itself, e.g., against unauthorized use.]
References
[APS99] Mark Allman, Vern Paxson, W. Richard Stevens. TCP
Congestion Control, April 1999. RFC 2581.
[BPK98] Hari Balakrishnan, Venkata Padmanabhan, Randy H. Katz. The
Effects of Asymmetry on TCP Performance. ACM Mobile Networks
and Applications (MONET), 1998.
[Jac90] Van Jacobson. Modified TCP Congestion Avoidance Algorithm.
Email to the end2end-interest mailing list, April 1990. URL:
ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt.
[SRC81] Jerome H. Saltzer, David P. Reed and David D. Clark,
End-to-End Arguments in System Design. Second International
Conference on Distributed Computing Systems (April, 1981) pages
509-512. Published with minor changes in ACM Transactions in
Computer Systems 2, 4, November, 1984, pages 277-288. Reprinted
in Craig Partridge, editor Innovations in
internetworking. Artech House, Norwood, MA, 1988, pages
195-206. ISBN 0-89006-337-0. Also scheduled to be reprinted in
Amit Bhargava, editor. Integrated broadband networks. Artech
House, Boston, 1991. ISBN 0-89006-483-0.
http://people.qualcomm.com/karn/library.html.
[RFC791] Jon Postel. Internet Protocol, September 1981. RFC 791.
[RFC1577] M. Laubach. Classical IP and ARP over ATM, January 1994.
RFC 1577.
Author's Addresses:
Phil Karn (karn@qualcomm.com)
Aaron Falk (afalk@panamsat.com)
Joe Touch (touch@isi.edu)
Marie-Jose Montpetit (marie@teledesic.com)
Html markup produced by rfcmarkup 1.129b, available from
https://tools.ietf.org/tools/rfcmarkup/