[Docs] [txt|pdf|xml|html] [Tracker] [Email] [Nits]

Versions: 00 01

Internet Engineering Task Force                             S. Ruby, Ed.
Internet-Draft                                                       IBM
Intended status: Informational                               L. Masinter
Expires: June 20, 2015                                             Adobe
                                                       December 17, 2014


                         Problem Statement: URL
                       draft-ruby-url-problem-00

Abstract

   This document lays out the problem space of possibly conflicting
   standards between multiple organizations for URLs and things like
   them, and proposes some actions to resolve the conflicts.  From a
   user or developer point of view, it makes no sense for there to be a
   proliferation of definitions of URL nor for there to be a
   proliferation of incompatible implementations.  This shouldn't be a
   competitive feature.  Therefore there is a need for the organizations
   involved to update and reconcile the various Internet Drafts,
   Recommendations, and Standards in this area.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on June 20, 2015.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents



Ruby & Masinter           Expires June 20, 2015                 [Page 1]


Internet-Draft           Problem Statement: URL            December 2014


   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Brief History of URL standards  . . . . . . . . . . . . . . .   2
   2.  Current Organizations and Specs in Development  . . . . . . .   3
     2.1.  IETF  . . . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  WHATWG  . . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.3.  W3C . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.4.  WebPlatform . . . . . . . . . . . . . . . . . . . . . . .   4
     2.5.  Unicode Consortium  . . . . . . . . . . . . . . . . . . .   4
   3.  Problem Statements  . . . . . . . . . . . . . . . . . . . . .   4
   4.  Outline of Potential Solution . . . . . . . . . . . . . . . .   5
   5.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   5
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   8.  Informative References  . . . . . . . . . . . . . . . . . . .   5
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   7

1.  Brief History of URL standards

   This section contains a very compressed history of URL standards, in
   sufficient detail to set some context.

   The first standards-track specification for URLs was [RFC1738] in
   1994.  (That spec contains more background material.)  It defined
   URLs as ASCII only.  Although it was quickly determined that it was
   desirable to allow non-ASCII characters, shoehorning utf-8 into
   ASCII-only systems was unacceptable; at the time Unicode was not so
   widely deployed.  The tack was taken to leave "URI" alone and define
   a new protocol element, "IRI"; [RFC3987] was published in 2005 (in
   sync with the [RFC3986] update to the URI definition).

   The IRI-to-URI transformation specified in [RFC3987] had options; it
   wasn't a deterministic path.  The URI-to-IRI transformation was also
   heuristic, since there was no guarantee that %xx-encoded bytes in the
   URI were actually meant to be %xx percent-hex-encoded bytes of a utf8
   encoding of a Unicode string.

   To address issues and to fix URL for HTML5, a new IRI working group
   <https://tools.ietf.org/wg/iri/charters [1]> was established in IETF
   in 2009.  Despite years of development, the IRI group was closed in
   2014, with the consolation that the documents that were being
   developed in the IRI working group could be updated as individual



Ruby & Masinter           Expires June 20, 2015                 [Page 2]


Internet-Draft           Problem Statement: URL            December 2014


   submissions or within the "applications area" working group.  In
   particular, one of the IRI working group items was to update
   [appsawg-uri-scheme-reg], which is currently under development in
   IETF's application area.

   Independently, the HTML specifications in the WHATWG and W3C
   redefined "URL" in an attempt to match what some of the browsers were
   doing.  This definition was moved out into the "URL - Living
   Standard" [URL-LS] .

   The world has also moved on.  ICANN has approved non-ASCII top level
   domains, but IDNA specs ([RFC3490] and [RFC5895]) did not fully
   addressed IRI processing.  Subsequently, the Unicode consortium
   produced [UTS-46].

2.  Current Organizations and Specs in Development

   There are multiple umbrella organizations which have produced
   multiple documents, and it's unclear whether there's a trajectory to
   make them consistent.  This section tries to enumerate currently
   active organizations and specs.

   Organizations include the IETF [2], the WHATWG [3], the W3C [4], Web
   Platform.org [5], and the Unicode Consortium [6].  Relevant specs
   under development in each organization include:

2.1.  IETF

   [appsawg-uri-scheme-reg] and [kerwin-file-scheme] are under active
   development.

   The IRI working group closed, but work can continue in the
   Applications Area working group.  Documents sitting needing update,
   abandoned now, are three drafts ([iri-3987bis], [iri-comparison], and
   [iri-bidi-guidelines]), which were originally intended to obsolete
   [RFC3987].

   In addition, there's quite a bit of activity around URNs and library
   identifiers in the URN working group, including some expressions of
   desire to update RFC 3986 to better accomodate desired URN semantics.

2.2.  WHATWG

   The [URL-LS] is being developed as a living standard [7].  It
   primarily focuses on specifying what is important for browsers.  The
   means by which new schemes might be registered is not yet defined.
   This work is based on [UTS-46], and is intented to obsolete both
   [RFC3986] and [RFC3987].



Ruby & Masinter           Expires June 20, 2015                 [Page 3]


Internet-Draft           Problem Statement: URL            December 2014


2.3.  W3C

   The Web Applications Working Group [8], in conjuction with the W3C
   TAG [9], sporadically have been republishing the WHATWG work with no
   technical content differences as [W3C-URL].  There is a
   [url-workmode] proposal to formalize this relationship.

2.4.  WebPlatform

   [WP-URL] is being developed on a develop [10] GitHub branch based on
   [URL-LS].  It currently contains work that has yet to be folded back
   into the [URL-LS], primarily to rewrite the parser logic in a way
   that is more understandable and approachable.  The intent is to merge
   this work once it is ready, and to actively work to keep the two
   versions in sync.

2.5.  Unicode Consortium

   [UTS-46] defines parameterized functions for mapping domain names.
   [URL-LS] builds upon this work, specifying particular values to be
   used for these parameters.

3.  Problem Statements

   The main problem is conflicting specifications that overlap but don't
   match each other.

   Additionally, the following are issues that need to be resolves to
   make URL processing unambiguous and stable.

   o  Nomenclature: over the years, a number of different sets of
      terminology has been used.  URL / URI / IRI is not the only
      difference.  [tantek-slice] chronicles a number of differences.

   o  Parameterization: standards in this area need to define such
      matters as normalization forms and values for parameters such as
      UseSTD3ASCIIRules.

   o  Interoperability: even after accounting for the above, there is a
      demonstrable lack of interoperability across popular libraries and
      browsers.  [whatwg-interop] identifies a number of such
      differences.

   o  Specific scheme definitions: some UR* scheme definitions are
      woefully out of date, incomplete, or don't correspond to current
      practice, but updating their definitions is unclear.  This
      includes "file:", for which there is a current effort, but there
      are others which need review (including 'ftp:', 'data').



Ruby & Masinter           Expires June 20, 2015                 [Page 4]


Internet-Draft           Problem Statement: URL            December 2014


4.  Outline of Potential Solution

   This problem clearly requires a cross-organizational solution,
   specifically:

   o  Build a plan to update or obsolete [RFC3986], [RFC3987],
      [RFC5895], and [kerwin-file-scheme] to be consistent with [URL-LS]
      and [UTS-46].  This may involve working to get the other
      specifications updated, if only to clarify nomenclature.

   o  Change the [URL-LS] goals to only obsolete specifications listed
      above that are not updated.  Presuming that [RFC3986] is updated,
      explicitly state that canonical URLs (i.e., the outout of the URL
      parser) not only round trip, but also are valid URIs.

   o  Reconcile how [appsawg-uri-scheme-reg] and [URL-LS] handle
      currently unknown schemes, update [appsawg-uri-scheme-reg] to
      state that registration applies to both URIs and URLs, and update
      [URL-LS] to indicate that [appsawg-uri-scheme-reg] is how you
      register schemes.

   o  Have the W3C adopt [url-workmode].

   o  Other than responsing to any feedback that may be provided, no
      changes to any Unicode Consortium product is required.

5.  Acknowledgements

   Helpful comments and improvements to this document have come from
   Anne van Kesteren and Graham Klyne.

6.  IANA Considerations

   This memo currently includes no request to IANA, although an updated
   [appsawg-uri-scheme-reg] might add some additional requirements and
   information to IANA URI scheme registry [11] to make clear that the
   schemes serve as URL schemes and IRI schemes as well as URI schemes.

7.  Security Considerations

   In addition to the security exposures created when URLs work
   differently in different systems, all of the security considerations
   defined in [RFC3490], [RFC3986], [RFC3987], and [RFC5895] apply to
   URLs.

8.  Informative References





Ruby & Masinter           Expires June 20, 2015                 [Page 5]


Internet-Draft           Problem Statement: URL            December 2014


   [RFC1738]  Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform
              Resource Locators (URL)", RFC 1738, December 1994.

   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
              "Internationalizing Domain Names in Applications (IDNA)",
              RFC 3490, March 2003.

   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
              Text on Security Considerations", BCP 72, RFC 3552, July
              2003.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66, RFC
              3986, January 2005.

   [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
              Identifiers (IRIs)", RFC 3987, January 2005.

   [RFC5895]  Resnick, P. and P. Hoffman, "Mapping Characters for
              Internationalized Domain Names in Applications (IDNA)
              2008", RFC 5895, September 2010.

   [URL-LS]   van Kesteren, A. and S. Ruby, "URL Living Standard", 2014,
              <https://url.spec.whatwg.org/>.

   [UTS-46]   Davis, M. and M. Suignard, "Unicode IDNA Compatibility
              Processing", 2014, <http://unicode.org/reports/tr46/>.

   [W3C-URL]  van Kesteren, A. and S. Ruby, "URL Working Draft", 2014,
              <http://www.w3.org/TR/url/>.

   [WP-URL]   van Kesteren, A. and S. Ruby, "URL Standard", 2014,
              <https://specs.webplatform.org/url/webspecs/develop/>.

   [appsawg-uri-scheme-reg]
              Thaler, D., Hansen, T., Hardie, T., and L. Masinter,
              "Guidelines and Registration Procedures for New URI
              Schemes", 2014, <https://tools.ietf.org/html/draft-ietf-
              appsawg-uri-scheme-reg>.

   [iri-3987bis]
              Duerst, M., Suignard, M., and L. Masinter,
              "Internationalized Resource Identifiers (IRIs)", 2012,
              <https://tools.ietf.org/html/draft-ietf-iri-3987bis-13>.

   [iri-bidi-guidelines]
              Duerst, M., Masinter, L., and A. Allawi, "Guidelines for
              Internationalized Resource Identifiers with Bi-directional



Ruby & Masinter           Expires June 20, 2015                 [Page 6]


Internet-Draft           Problem Statement: URL            December 2014


              Characters (Bidi IRIs)", 2012, <https://tools.ietf.org/
              html/draft-ietf-iri-bidi-guidelines>.

   [iri-comparison]
              Masinter, L. and M. Duerst, "Comparison, Equivalence and
              Canonicalization of Internationalized Resource
              Identifiers", 2012, <https://tools.ietf.org/html/draft-
              ietf-iri-comparison>.

   [kerwin-file-scheme]
              Kerwin, M., "The file URI Scheme", 2014, <https://
              tools.ietf.org/html/draft-kerwin-file-scheme>.

   [tantek-slice]
              Celik, T., "How many ways can you slice a URL and name the
              pieces?", 2011, <http://tantek.com/2011/238/b1/many-ways-
              slice-url-name-pieces>.

   [url-workmode]
              Ruby, S., "URL WorkMode", 2014, <https://github.com/
              webspecs/url/blob/develop/docs/workmode.md#preface>.

   [whatwg-interop]
              Ruby, S., "URL test results", 2014, <https://
              url.spec.whatwg.org/interop/test-results/>.

Authors' Addresses

   Sam Ruby (editor)
   IBM
   Raleigh
   USA

   Email: rubys@intertwingly.net
   URI:   http://intertwingly.net/


   Larry Masinter
   Adobe
   345 Park Ave
   San Jose, CA  95110
   USA

   Email: masinter@adobe.com
   URI:   http://larry.masinter.net/






Ruby & Masinter           Expires June 20, 2015                 [Page 7]


Html markup produced by rfcmarkup 1.129c, available from https://tools.ietf.org/tools/rfcmarkup/