draft-ietf-dnsop-respsize-10.txt   draft-ietf-dnsop-respsize-11.txt 
Internet Engineering Task Force P. Vixie Internet Engineering Task Force P. Vixie
Internet-Draft Internet Systems Consortium Internet-Draft Internet Systems Consortium
Intended status: Informational A. Kato Intended status: Informational A. Kato
Expires: August 27, 2008 The University of Tokyo/WIDE Expires: January 16, 2009 Keio University/WIDE Project
Project July 15, 2008
February 24, 2008
DNS Referral Response Size Issues DNS Referral Response Size Issues
draft-ietf-dnsop-respsize-10 draft-ietf-dnsop-respsize-11
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 36 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 27, 2008. This Internet-Draft will expire on January 16, 2009.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
With a mandated default minimum maximum UDP message size of 512 With a mandated default minimum maximum UDP message size of 512
octets, the DNS protocol presents some special problems for zones octets, the DNS protocol presents some special problems for zones
wishing to expose a moderate or high number of authority servers (NS wishing to expose a moderate or high number of authority servers (NS
RRs). This document explains the operational issues caused by, or RRs). This document explains the operational issues caused by, or
related to this response size limit, and suggests ways to optimize related to this response size limit, and suggests ways to optimize
the use of this limited space. Guidance is offered to DNS server the use of this limited space. Guidance is offered to DNS server
implementors and to DNS zone operators. implementors and to DNS zone operators.
1. Introduction 1. Introduction and Overview
1.1. Introduction and Overview
The DNS standard limits UDP message size to 512 octets (see [RFC1035] The DNS standard limits UDP message size to 512 octets (see [RFC1035]
4.2.1). Even though this limitation was due to the required minimum 4.2.1). Even though this limitation was due to the required minimum
IP reassembly limit for IPv4, it became a hard DNS protocol limit and IP reassembly limit for IPv4, it became a hard DNS protocol limit and
is not implicitly relaxed by changes in a network layer protocol, for is not implicitly relaxed by changes in a network layer protocol, for
example to IPv6. example to IPv6.
The EDNS protocol extension starting with version 0 permits larger The EDNS (Extension Mechanisms for DNS) protocol extension starting
responses by mutual agreement of the requester and responder (see with version 0 permits larger responses by mutual agreement of the
[RFC2671] 2.3, 4.5), and it is recommended to support EDNS. The 512 requester and responder (see [RFC2671] 2.3, 4.5), and it is
octets UDP message size limit will remain in practical effect until recommended to support EDNS. The 512 octets UDP message size limit
virtually all DNS servers and resolvers support EDNS. will remain in practical effect until virtually all DNS servers and
resolvers support EDNS.
Since DNS responses include a copy of the request, the space Since DNS responses include a copy of the request, the space
available for response data is somewhat less than the full 512 available for response data is somewhat less than the full 512
octets. Negative responses are quite small, but for positive and octets. Negative responses are quite small, but for positive and
referral responses, every octet must be carefully and sparingly referral responses, every octet must be carefully and sparingly
allocated. While the response size of positive responses is also a allocated. While the response size of positive responses is also a
concern in [RFC3226], this document specifically addresses referral concern in [RFC3226], this document specifically addresses referral
response size. response size.
2. Delegation Details 2. Delegation Details
2.1. Relevant Protocol Elements 2.1. Relevant Protocol Elements
A delegation response will include the following elements: A positive delegation response will include the following elements:
Header Section: fixed length (12 octets) Header Section: fixed length (12 octets)
Question Section: original query (name, class, type) Question Section: original query (name, class, type)
Answer Section: empty, or a CNAME/DNAME chain Answer Section: empty, or a CNAME/DNAME chain
Authority Section: NS RRset (nameserver names) Authority Section: NS RRset (nameserver names)
Additional Section: A and AAAA RRsets (nameserver addresses) Additional Section: A and AAAA RRsets (nameserver addresses)
Note: CNAME defines a canonical name ([RFC1034]) while DNAME maps an
entire subtree to another domain ([RFC2672]).
If the total size of the UDP response exceeds 512 octets or the size If the total size of the UDP response exceeds 512 octets or the size
advertised in EDNS, and if the data that does not fit was "required", advertised in EDNS, and if the data that does not fit was "required",
then the TC bit will be set (indicating truncation). This will then the TC bit will be set (indicating truncation). This will
usually cause the requester to retry using TCP, depending on what usually cause the requester to retry using TCP, depending on what
information was desired and what information was omitted. For information was desired and what information was omitted. For
example, truncation in the authority section is of no interest to a example, truncation in the authority section is of no interest to a
stub resolver who only plans to consume the answer section. If a stub resolver who only plans to consume the answer section. If a
retry using TCP is needed, the total cost of the transaction is much retry using TCP is needed, the total cost of the transaction is much
higher. See [RFC1123] 6.1.3.2 for details on the requirement that higher. See [RFC1123] 6.1.3.2 for details on the requirement that
UDP be attempted before falling back to TCP. UDP be attempted before falling back to TCP.
RRsets are never sent partially unless the TC bit is set to indicate RRsets (Resource Record Set, see [RFC2136]) are never sent partially
truncation. When the TC bit is set, the final apparent RRset in the unless the TC bit is set to indicate truncation. When the TC bit is
final non-empty section must be considered "possibly damaged" (see set, the final apparent RRset in the final non-empty section must be
[RFC1035] 6.2, [RFC2181] 9). considered "possibly damaged" (see [RFC1035] 6.2, [RFC2181] 9).
With or without truncation, the glue present in the additional data With or without truncation, the glue present in the additional data
section should be considered "possibly incomplete", and requesters section should be considered "possibly incomplete", and requesters
should be prepared to re-query for any damaged or missing RRsets. should be prepared to re-query for any damaged or missing RRsets.
Note that truncation of the additional data section might not be Note that truncation of the additional data section might not be
signaled via the TC bit since additional data is often optional (see signaled via the TC bit since additional data is often optional (see
discussion in [RFC4472] B). discussion in [RFC4472] B).
DNS label compression allows the component labels of a domain name to DNS label compression allows the component labels of a domain name to
be instantiated exactly once per DNS message, and then referenced be instantiated exactly once per DNS message, and then referenced
skipping to change at page 4, line 6 skipping to change at page 4, line 4
Some queries to non-existing names can be large, but this is not a Some queries to non-existing names can be large, but this is not a
problem because negative responses need not contain any answer, problem because negative responses need not contain any answer,
authority or additional records. See [RFC2308] 2.1 for more authority or additional records. See [RFC2308] 2.1 for more
information about the format of negative responses. information about the format of negative responses.
The minimum useful number of name servers is two, for redundancy (see The minimum useful number of name servers is two, for redundancy (see
[RFC1034] 4.1). A zone's name servers should be reachable by all IP [RFC1034] 4.1). A zone's name servers should be reachable by all IP
protocols versions (e.g., IPv4 and IPv6) in common use. As long as protocols versions (e.g., IPv4 and IPv6) in common use. As long as
the servers are well managed, the server serving IPv6 might be the servers are well managed, the server serving IPv6 might be
different from the server serving IPv4 sharing the same server name. different from the server serving IPv4 sharing the same server name.
It is important to ensure that a zone has servers reachable by all IP
protocol in common use (e.g., IPv4 and IPv6).
The best case is no truncation at all. This is because many The best case is no truncation at all. This is because many
requesters will retry using TCP immediately, or will automatically requesters will retry using TCP immediately, or will automatically
requery for RRsets that are possibly truncated, without considering requery for RRsets that are possibly truncated, without considering
whether the omitted data was actually necessary. whether the omitted data was actually necessary.
Anycasting [RFC3258] is a useful tool for performance and reliability Anycasting [RFC3258] is a useful tool for performance and reliability
without increasing the size of referral responses. without increasing the size of referral responses.
While it is irrelevant to the response size issue, all zones have to While it is irrelevant to the response size issue, all zones have to
skipping to change at page 5, line 4 skipping to change at page 4, line 48
missing glue records separately which will introduce extra queries missing glue records separately which will introduce extra queries
and extra time to resolve a given name. and extra time to resolve a given name.
A delegation response should prioritize glue records as follows. A delegation response should prioritize glue records as follows.
first: first:
All glue RRsets for one name server whose name is in or below the All glue RRsets for one name server whose name is in or below the
zone being delegated, or which has multiple address RRsets zone being delegated, or which has multiple address RRsets
(currently A and AAAA), or preferably both; (currently A and AAAA), or preferably both;
second: second:
Alternate between adding all glue RRsets for any name servers Alternate between adding all glue RRsets for any name servers
whose names are in or below the zone being delegated, and all whose names are in or below the zone being delegated, and all
glue RRsets for any name servers who have multiple address RRsets glue RRsets for any name servers who have multiple address RRsets
(currently A and AAAA); (currently A and AAAA);
thence: thence:
All other glue RRsets, in any order. All other glue RRsets, in any order.
Whenever there are multiple candidates for a position in this Whenever there are multiple candidates for a position in this
priority scheme, one should be chosen on a round-robin or fully priority scheme, one should be chosen on a round-robin or fully
random basis. The goal of this priority scheme is to offer random basis. The goal of this priority scheme is to offer
"necessary" glue first to fill into the response if possible. "necessary" glue first to fill into the response if possible.
If any "necessary content" is not able to fill in the response, then If any "necessary" content cannot be fit in the response, then it is
it is advisable that the TC bit be set in order to force a TCP retry, advisable that the TC bit be set in order to force a TCP retry,
rather than have the zone be unreachable. Note that a parent rather than have the zone be unreachable. Note that a parent
server's proper response to a query for in-child glue or below-child server's proper response to a query for in-child glue or below-child
glue is a referral rather than an answer, and that this referral must glue is a referral rather than an answer, and that this referral must
be able to contain the in-child or below-child glue, and that in be able to contain the in-child or below-child glue, and that in
outlying cases, only EDNS or TCP will be large enough to contain that outlying cases, only EDNS or TCP will be large enough to contain that
data. data.
The glue record order should be independent to the version of IP used
in the query because the DNS server just see a query from an
intermediate server rather than the query from the original client.
3. Analysis 3. Analysis
An instrumented protocol trace of a best case delegation response is An instrumented protocol trace of a best case delegation response is
shown in Figure 1. Note that 13 servers are named, and 13 addresses shown in Figure 1. Note that 13 servers are named, and 13 addresses
are given. This query was artificially designed to exactly reach the are given. This query was artificially designed to exactly reach the
512 octets limit. 512 octets limit.
;; flags: qr rd; QUERY: 1, ANS: 0, AUTH: 13, ADDIT: 13 ;; flags: qr rd; QUERY: 1, ANS: 0, AUTH: 13, ADDIT: 13
;; QUERY SECTION: ;; QUERY SECTION:
;; [23456789.123456789.123456789.\ ;; [23456789.123456789.123456789.\
skipping to change at page 9, line 5 skipping to change at page 9, line 5
clients, which are required to handle larger RRsets such as AAAA RRs, clients, which are required to handle larger RRsets such as AAAA RRs,
are more likely to speak EDNS which can use a larger UDP response are more likely to speak EDNS which can use a larger UDP response
size limit, and partly because the resource records (A and AAAA) are size limit, and partly because the resource records (A and AAAA) are
in different RRsets and are therefore divisible from each other. in different RRsets and are therefore divisible from each other.
Name server names which are at or below the zone they serve are more Name server names which are at or below the zone they serve are more
sensitive to referral response truncation, and glue records for them sensitive to referral response truncation, and glue records for them
should be considered "more important" than other glue records, in the should be considered "more important" than other glue records, in the
assembly of referral responses. assembly of referral responses.
If a zone is served by thirteen (13) name servers having a common
parent name (such as ?.ROOT-SERVERS.NET.) and each such name server
has a single address record in some protocol family (e.g., an A RR),
then all thirteen name servers or any subset thereof could have
address records in a second protocol family by adding a second
address record (e.g., an AAAA RR) without reducing the reachability
of the zone thus served.
5. Security Considerations 5. Security Considerations
The recommendations contained in this document have no known security The recommendations contained in this document have no known security
implications. implications.
6. IANA Considerations 6. IANA Considerations
This document does not call for changes or additions to any IANA This document does not call for changes or additions to any IANA
registry. registry.
7. Acknowledgement 7. Acknowledgement
The authors thank Peter Koch, Rob Austein, Joe Abley, Mark Andrews, The authors thank Peter Koch, Rob Austein, Joe Abley, Mark Andrews,
Kenji Rikitake, Stephane Bortzmeyer, Olafur Gudmundsson, and Alfred Kenji Rikitake, Stephane Bortzmeyer, Olafur Gudmundsson, Alfred
Hoenes for their valuable comments and suggestions. Hoenes, and Alexander Mayrhofer for their valuable comments and
suggestions.
This work was supported by the US National Science Foundation This work was supported by the US National Science Foundation
(research grant SCI-0427144) and DNS-OARC. (research grant SCI-0427144) and DNS-OARC.
8. Normative References 8. References
[PERL] Wall, L., Christiansen, T., and J. Orwant, "Programming 8.1. Normative References
Perl, 3rd ed.", ISBN 0-596-00027-8, July 2000.
[RFC1034] Mockapetris, P., "Domain names - concepts and facilities", [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
STD 13, RFC 1034, November 1987. STD 13, RFC 1034, November 1987.
[RFC1035] Mockapetris, P., "Domain names - implementation and [RFC1035] Mockapetris, P., "Domain names - implementation and
specification", STD 13, RFC 1035, November 1987. specification", STD 13, RFC 1035, November 1987.
[RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
Specification", RFC 2181, July 1997.
8.2. Informative References
[PERL] Wall, L., Christiansen, T., and J. Orwant, "Programming
Perl, 3rd ed.", ISBN 0-596-00027-8, July 2000.
[RFC1123] Braden, R., "Requirements for Internet Hosts - Application [RFC1123] Braden, R., "Requirements for Internet Hosts - Application
and Support", STD 3, RFC 1123, October 1989. and Support", STD 3, RFC 1123, October 1989.
[RFC1996] Vixie, P., "A Mechanism for Prompt Notification of Zone [RFC1996] Vixie, P., "A Mechanism for Prompt Notification of Zone
Changes (DNS NOTIFY)", RFC 1996, August 1996. Changes (DNS NOTIFY)", RFC 1996, August 1996.
[RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
Specification", RFC 2181, July 1997. "Dynamic Updates in the Domain Name System (DNS UPDATE)",
RFC 2136, April 1997.
[RFC2308] Andrews, M., "Negative Caching of DNS Queries (DNS [RFC2308] Andrews, M., "Negative Caching of DNS Queries (DNS
NCACHE)", RFC 2308, March 1998. NCACHE)", RFC 2308, March 1998.
[RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
RFC 2671, August 1999. RFC 2671, August 1999.
[RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection",
RFC 2672, August 1999.
[RFC3226] Gudmundsson, O., "DNSSEC and IPv6 A6 aware server/resolver [RFC3226] Gudmundsson, O., "DNSSEC and IPv6 A6 aware server/resolver
message size requirements", RFC 3226, December 2001. message size requirements", RFC 3226, December 2001.
[RFC3258] Hardie, T., "Distributing Authoritative Name Servers via [RFC3258] Hardie, T., "Distributing Authoritative Name Servers via
Shared Unicast Addresses", RFC 3258, April 2002. Shared Unicast Addresses", RFC 3258, April 2002.
[RFC3901] Durand, A. and J. Ihren, "DNS IPv6 Transport Operational [RFC3901] Durand, A. and J. Ihren, "DNS IPv6 Transport Operational
Guidelines", BCP 91, RFC 3901, September 2004. Guidelines", BCP 91, RFC 3901, September 2004.
[RFC4472] Durand, A., Ihren, J., and P. Savola, "Operational [RFC4472] Durand, A., Ihren, J., and P. Savola, "Operational
skipping to change at page 12, line 31 skipping to change at page 12, line 34
Paul Vixie Paul Vixie
Internet Systems Consortium Internet Systems Consortium
950 Charter Street 950 Charter Street
Redwood City, CA 94063 Redwood City, CA 94063
US US
Phone: +1 650 423 1300 Phone: +1 650 423 1300
Email: paul@vix.com Email: paul@vix.com
Akira Kato Akira Kato
The University of Tokyo/WIDE Project Keio University/WIDE Project
Information Technology Center, 2-11-16 Yayoi Graduate School of Media Design, 4-1-1 Hiyoshi
Bunkyo, Tokyo 113-8658 Kohoku, Yokohama 223-8526
JP JP
Phone: +81 3 5841 2750 Phone: +81 3 5418 6419
Email: kato@wide.ad.jp Email: kato@wide.ad.jp
Full Copyright Statement Full Copyright Statement
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors contained in BCP 78, and except as set forth therein, the authors
retain all their rights. retain all their rights.
 End of changes. 22 change blocks. 
42 lines changed or deleted 48 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/