CoRE Working Group                                             K. Hartke
Internet-Draft                                                  Ericsson
Intended status: Standards Track                          8 January                            9 March 2020
Expires: 11 July 10 September 2020

                    Constrained Resource Identifiers
                        draft-ietf-core-href-02
                        draft-ietf-core-href-03

Abstract

   The Constrained Resource Identifiers (CoRIs) are an alternate
   serialization of Identifier (CRI) is a complement to the
   Uniform Resource Identifiers (URIs) Identifier (URI) that encodes serializes the URI components
   in Concise Binary Object Representation (CBOR) instead of a string sequence
   of characters.  This simplifies parsing, reference
   resolution, and comparison of URIs and reference
   resolution in environments with severe limitations on processing
   power, code size, and memory size.

Note to Readers

   This note is to be removed before publishing as an RFC.

   The issues list for this Internet-Draft can be found at
   <https://github.com/core-wg/coral/labels/href>.

   A reference implementation and a set of test vectors can be found at
   <https://github.com/core-wg/coral/tree/master/binary/python>.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 11 July 10 September 2020.

Copyright Notice

   Copyright (c) 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Simplified BSD License text
   as described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Simplified BSD License.

Table of Contents

   1.  Introduction
     1.1.  Notational Conventions
   2.  Data Model
     2.1.  Options
     2.2.  Option Sequences  Constraints
   3.  CBOR  Creation and Normalization
   4.  Python
     4.1.  Comparison
   5.  CRI References
     5.1.  CBOR Serialization
     5.2.  Reference Resolution
     4.2.  URI Recomposition
     4.3.  CoAP Encoding
   5.
   6.  Relationship between CRIs, URIs and IRIs
     6.1.  Converting CRIs to URIs
   7.  Security Considerations
   6.
   8.  IANA Considerations
   7.
   9.  References
     7.1.
     9.1.  Normative References
     7.2.
     9.2.  Informative References
   Appendix A.  Change Log
   Acknowledgements
   Author's Address

1.  Introduction

   The Uniform Resource Identifier (URI) references [RFC3986] and its most common
   usage, the URI reference, are the Internet standard way to link for linking to
   resources in hypertext formats such as HTML [W3C.REC-html52-20171214] or
   and the HTTP "Link" header field [RFC8288].

   A URI reference is either a URI or a relative reference that must be
   resolved against a base URI.

   URI references are strings sequence of characters chosen from the
   repertoire of US-ASCII characters.  The individual components of a
   URI reference are delimited by a number of reserved characters, which
   necessitates the use of percent-encoding an escape mechanism ("percent-encoding") when
   these reserved characters are used in a non-delimiting function.  One component can also contain special
   dot-segments that affect how the component is to be interpreted.  The
   resolution of URI references involves parsing the a character string sequence
   into its components, combining those components with the components
   of a base URI, merging path components, removing dot-segments, and
   recomposing the result back into a character string. sequence.

   Overall, the proper processing handling of URIs URI references is quite complicated. relatively
   intricate.  This can be a problem in particular problem, especially in constrained
   environments [RFC7228], [RFC7228] where devices nodes often have severe code size and
   memory size limitations.  As a result, many implementations in these such
   environments choose to support only an ad-hoc, informally-specified, bug-ridden, bug-
   ridden, non-interoperable subset of half of the URI standard. RFC 3986.

   This document introduces defines the Constrained Resource Identifier (CoRI)
   references, an alternate serialization of URI references that encodes
   the URI (CRI) by
   constraining URIs to a simplified subset and serializing their
   components in Concise Binary Object Representation (CBOR)
   [RFC7049]
   [RFC7049bis] instead of a string sequence of characters.  Assuming an
   implementation of CBOR is already present on a device,  This allows
   typical operations on URI references such as parsing, reference resolution,
   and comparison can and
   reference resolution to be implemented more easily than for character
   strings.  A full implementation that covers (including all corner cases is
   intended to be implementable cases)
   in a relatively comparatively small amount of code.

   As a result of the simplification, CoRI references however, CRIs are not capable of
   expressing all URI references URIs permitted by the generic syntax of RFC 3986.
   (Hence 3986
   (hence the "constrained" in "Constrained Resource Identifiers".) Identifier").  The
   supported subset includes all URIs of the Constrained Application
   Protocol (CoAP)
   URIs [RFC7252], most URIs of the Hypertext Transfer
   Protocol (HTTP) URIs [RFC7230], and many other URIs that function as resource locators. are similar.  The
   exact constraints are defined in Section 2.

1.1.  Notational Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   Terms defined in this document appear in _cursive_ where they are
   introduced.

2.  Data Model

   The data model for CoRI references is very similar to
   introduced (rendered in plain text as the
   serialization new term surrounded by
   underscores).

2.  Constraints

   A Constrained Resource Identifier consists of the request URI in CoAP messages [RFC7252]: The same five
   components of a URI reference are encoded as a sequence of _options_,
   where each path segment and query parameter becomes its own option.
   Every option consists of an _option number_ identifying the type of
   option (scheme, host name, path segment, etc.) URI: scheme, authority, path, query, and an _option value_.

2.1.  Options fragment.
   The following types of options components are defined:

   scheme
      Specifies subject to the URI scheme. following constraints:

   C1.   The option value scheme name can be any Unicode string matching (see Definition D80
         in [Unicode]) that matches the "scheme" rule described syntax defined in Section 3.1 of RFC
      3986 [RFC3986], excluding uppercase letters.

   host.name
      Specifies the host of the URI
         [RFC3986] and is lowercase (see Definition D139 in [Unicode]).

   C2.   An authority as is always a host identified by an IP address or
         registered name.  The
      option value name, along with optional port information.  User
         information is not supported.

   C3.   An IP address can be any Unicode string matching the specifications
      of the URI scheme.

   host.ip
      Specifies the host of the URI authority as either an IPv4 address or an IPv6 address.  The option value is a byte string with a length
         IPv6 scoped addressing zone identifiers and future versions of
      either 4 or 16 bytes, respectively.

   port
      Specifies
         IP are not supported.

   C4.   A registered name can be any Unicode string that is lowercase
         and in Unicode Normalization Form C (NFC) (see Definition D120
         in [Unicode]).  (The syntax may be further restricted by the
         scheme.)

   C5.   A port number of the URI authority.  The option value is always an integer in the range from 0 to 65535.

   path.type
      Specifies
         Empty ports or ports outside this range are not supported.

   C6.   The port is omitted if and only if the type of port would be the URI path for reference resolution.  The
      option value same
         as the scheme's default port (provided the scheme is an integer in defining
         such a default port) or the range from 0 scheme is not using ports.

   C7.   A path consists of zero or more path segments.  A path must not
         consist of a single zero-length path segment, which is
         considered equivalent to 127, named as
      follows:

      0  absolute-path
      1  append-relation
      2  append-path
      3  relative-path
      4  relative-path-1up
      5  relative-path-2up
      6  relative-path-3up
      7  relative-path-4up
         ...
      127  relative-path-124up a path
      Specifies one segment of the URI path.  The option value zero path segments.

   C8.   A path segment can be any Unicode string that is in NFC
         (including the zero-length string) with the exception of the
         special "." and "..".  This
      option can occur more than once. ".." complete path segments.  No special
         constraints are placed on the first path segment.

   C9.   A query
      Specifies one argument always consists of the URI query.  The option value one or more query parameters.  A
         query parameter can be any Unicode string.  This option can occur more than once.

   fragment
      Specifies string that is in NFC.  It
         is often in the form of a "key=value" pair.  When converting a
         CRI to a URI, query parameters are separated by an ampersand
         ("&") character.  (This matches the structure and encoding of
         the query in CoAP URIs.)

   C10.  A fragment identifier.  The option value identifier can be any Unicode string.

   No percent-encoding string that is performed in option values.

2.2.  Option Sequences

                  _ host.name _
   ____ scheme __/             \___ port _
    \ \________/ \__ host.ip __/ /        \
     \__________________________/ ________/
      \                          / ________    _________
       \                        / /        \  /         \
        \__________ path.type __\_\_ NFC.

   C11.  The syntax of registered names, path _/__\_ segments, query _/__
         parameters, and fragment __
                  \___________/   \________/  \_________/ \__________/

          Figure 1: Structure of a Well-Formed Sequence of Options

   A sequence of options is considered _well-formed_ if:

   * identifiers may be further restricted
         and sub-structured by the sequence of options is empty or starts with a "scheme",
      "host.name", "host.ip", "port", "path.type", "path", "query", or
      "fragment" option;

   *  any "scheme" option scheme.  There is followed by either no support,
         however, for escaping sub-delimiters that are not intended to
         be used in a "host.name" or delimiting function.

   C12.  When converting a
      "host.ip" option;

   *  any "host.name" option is followed by CRI to a "port" option;

   * URI, any "host.ip" option character that is followed by outside
         the allowed character range or a "port" option;

   *  any "port" option delimiter in the URI syntax is followed by
         percent-encoded.  Percent-encoding always uses the UTF-8
         encoding form (see Definition D92 in [Unicode]) to convert the
         character to a "path", "query", or "fragment"
      option sequence of one or is at more octets.

3.  Creation and Normalization

   Resource identifiers are generally created on the end initial creation of the sequence;

   *  any "path.type" option is followed by
   a "path", "query", or
      "fragment" option resource with a certain resource identifier, or is at the end initial
   exposition of the sequence;

   *  any "path" option is followed by a "path", "query", or "fragment"
      option or is at resource under a particular resource identifier.

   A Constrained Resource Identifier SHOULD be created by the end naming
   authority that governs the namespace of the sequence;

   *  any "query" option is followed by a "query" or "fragment" option
      or is at resource identifier.  For
   example, for the end resources of the sequence; and

   *  any "fragment" option an HTTP origin server, that server is at
   responsible for creating the end of CRIs for those resources.

   The creator MUST ensure that any CRI created satisfies the sequence.

   A well-formed sequence
   constraints defined in Section 2.  The creation of options is considered _absolute_ a CRI fails if the
   sequence
   CRI cannot be validated to satisfy all of options starts with the constraints.

   If a "scheme" option.

   A well-formed sequence of options is considered _relative_ creator creates a CRI from user input, it MAY apply the
   following (and only the following) normalizations to get the CRI more
   likely to validate: map the scheme name to lowercase (C1.); map the
   registered name to NFC (C4.); elide the port if it's the
   sequence of options is empty or starts with an option other than a
   "scheme" option.

   An absolute sequence of options is considered _normalized_ if default port
   for the
   result of resolving scheme (C6.); elide a single zero-length path segment (C7.);
   map path segments, query parameters and the sequence of options against any base is equal fragment identifier to the input.  (It doesn't matter what base it is resolved against,
   since
   NFC (C8., C9., C10.).

   Once a CRI has been created, it is already absolute.)

   The following operations can be performed used and transferred without
   further normalization.  All operations that operate on a sequence of options:

   resolve(href, base)
      Resolves a well-formed sequence of options `href` against an
      absolute sequence of options `base`. This operation MUST be
      performed by applying any algorithm CRI SHOULD
   rely on the assumption that the CRI is functionally
      equivalent to appropriately pre-normalized.
   (This does not contradict the reference implementation requirement that when CRIs are
   transferred, recipients must operate on as-good-as untrusted input
   and fail gracefully in Section 4.1 the face of this
      document.

   relative(href, base)
      Makes an absolute sequence malicious inputs.)

4.  Comparison

   One of options `href` relative the most common operations on CRIs is comparison: determining
   whether two CRIs are equivalent, without using the CRIs to an
      absolute sequence access
   their respective resource(s).

   Determination of options `base`. This operation MUST be
      performed by applying any algorithm that returns a sequence equivalence or difference of
      options such CRIs is based on simple
   component-wise comparison.  If two CRIs are identical component-by-
   component (using code-point-by-code-point comparison for components
   that `resolve(relative(h, b), b)` are Unicode strings) then it is equal safe to `h`
      given the same `b`.

   recompose(href)
      Recomposes a URI from an absolute sequence of options `href`. This
      operation MUST be performed by applying any algorithm conclude that they are
   equivalent.

   This comparison mechanism is
      functionally equivalent designed to the reference implementation minimize false negatives
   while strictly avoiding false positives.  The constraints defined in
   Section 4.2 of this document.

      To reduce variability, it is RECOMMENDED to uppercase the letters
      in 2 imply the hexadecimal notation when percent-encoding octets [RFC3986] most common forms of syntax- and to follow scheme-based
   normalizations in URIs, but do not comprise protocol-based
   normalizations that require accessing the recommendations of Section 4 resources or detailed
   knowledge of RFC 5952 for the
      text representation of IPv6 addresses [RFC5952].

   decompose(str)
      Decomposes a URI `str` into a sequence of options.  This operation
      MUST scheme's dereference algorithm.  False negatives can
   be performed caused by applying any algorithm resource aliases and CRIs that returns do not fully satisfy the
   constraints.

   When CRIs are compared to select (or avoid) a
      sequence of options network action, such that `recompose(decompose(x))` is
      equivalent to `x`.

   coap(href)
      Constructs CoAP options from an absolute, normalized sequence as
   retrieval of
      options.  This operation MUST a representation, fragment components (if any) should be performed by recomposing
   excluded from the
      sequence comparison.

5.  CRI References

   The most common usage of options a Constrained Resource Identifier is to
   embed it in resource representations, e.g., to express a URI (as described above) hyperlink
   between the represented resource and decomposing the URI into CoAP options (as resource identified by the
   CRI.

   This section defines the serialization of CRIs in Concise Binary
   Object Representation (CBOR) [RFC7049bis].  To reduce representation
   size, CRIs are not serialized directly.  Instead, CRIs are indirectly
   referenced through _CRI references_ that take advantage of
   hierarchical locality.  The CBOR serialization of CRI references is
   specified in Section 6.4 of RFC
      7252).  A concise implementation 5.1.

   The only operation defined on a CRI reference is _reference
   resolution_: the act of transforming a CRI reference into a CRI.  An
   application MUST implement this operation by applying the algorithm is illustrated
   specified in Section 4.3 5.2 or any algorithm that is functionally
   equivalent to it.

   The method of this document.

3.  CBOR

   In Concise Binary Object Representation (CBOR) [RFC7049], transforming a sequence CRI into a CRI reference is unspecified;
   implementations are free to use any algorithm as long as reference
   resolution of options the resulting CRI reference yields the original CRI.

   When testing for equivalence or difference, applications SHOULD NOT
   directly compare CRI references; the references should be resolved to
   their respective CRI before comparison.

5.1.  CBOR Serialization

   A CRI reference is encoded as an a CBOR array [RFC7049bis] that contains
   a sequence of zero or more options.  Each option consists of an
   option number followed by an option value, holding one component or
   sub-component of the CRI reference.  To reduce size, both option
   numbers and option values are immediate elements of the CBOR array
   and appear in alternating order.

   Not all possible sequences of options denote a well-formed CRI
   reference.  The structure can be described in the Concise Data
   Definition Language (CDDL) [RFC8610] as follows:

      CoRI

      CRI-Reference = [
        (?scheme, ?((host.name // host.ip), ?port) // path.type),
        *path,
        *query,
        ?fragment
      ]

      scheme    = [?(scheme:    1, (0, text .regexp "[a-z][a-z0-9+.-]*"),
              ?(host.name: 2, text //
                host.ip:   3, "[a-z][a-z0-9+.-]*")
      host.name = (1, text)
      host.ip   = (2, bytes .size 4 / bytes .size 16),
              ?(port:      4, 0..65535),
              ?(path.type: 5, 0..127),
              *(path:      6, text),
              *(query:     7, text),
              ?(fragment:  8, text)]

   Examples:

      [1, "coap",
       3, h'C6336401',
       4, 5683,
       6, ".well-known",
       6, "core"]

      [5, 0,
       6, ".well-known",
       6, "core",
       7, "rt=temperature-c"]

4.  Python

   In Python, a sequence of 16)
      port      = (3, 0..65535)
      path.type = (4, 0..127)
      path      = (5, text)
      query     = (6, text)
      fragment  = (7, text)

   The options is encoded as correspond to the (sub-)components of a list CRI, as described
   in Section 2, with the addition of tuples,
   where each tuple contains one option number and one option value. the "path.type" option.  The following Python 3.6 code illustrates how
   "path.type" option can be used to check a express path prefixes like "/",
   "./", "../", "../../", etc.  The exact semantics of the option values
   are defined by Section 5.2.  A sequence of options for being well-formed, absolute, and relative.

      <CODE BEGINS>
      import enum

      class Option(enum.IntEnum):
        _BEGIN = 0
        SCHEME = 1
        HOST_NAME = 2
        HOST_IP = 3
        PORT = 4
        PATH_TYPE = 5
        PATH = 6
        QUERY = 7
        FRAGMENT = 8
        _END = 9

      class PathType(enum.IntEnum):
        ABSOLUTE_PATH = 0
        APPEND_RELATION = 1
        APPEND_PATH = 2
        RELATIVE_PATH = 3
        RELATIVE_PATH_1UP = 4
        RELATIVE_PATH_2UP = 5
        RELATIVE_PATH_3UP = 6
        RELATIVE_PATH_4UP = 7

      _TRANSITIONS = ([Option.SCHEME, Option.HOST_NAME, Option.HOST_IP,
          Option.PORT, Option.PATH_TYPE, Option.PATH, Option.QUERY,
          Option.FRAGMENT, Option._END],
        [Option.HOST_NAME, Option.HOST_IP],
        [Option.PORT],
        [Option.PORT],
        [Option.PATH, Option.QUERY, Option.FRAGMENT, Option._END],
        [Option.PATH, Option.QUERY, Option.FRAGMENT, Option._END],
        [Option.PATH, Option.QUERY, Option.FRAGMENT, Option._END],
        [Option.QUERY, Option.FRAGMENT, Option._END],
        [Option._END])

      def is_well_formed(href):
        previous = Option._BEGIN
        for option, _ in href:
          if option not in _TRANSITIONS[previous]:
            return False
          previous = option
        if Option._END not in _TRANSITIONS[previous]:
          return False
        return True

      def is_absolute(href):
        return is_well_formed(href) and \
          (len(href) != 0 and href[0][0] == Option.SCHEME)

      def is_relative(href):
        return is_well_formed(href) and \
          (len(href) == 0 that is empty or href[0][0] != Option.SCHEME)
      <CODE ENDS>

   Examples:

      [(Option.SCHEME, 'coap'),
       (Option.HOST_IP, b'\xC6\x33\x64\x01'),
       (Option.PORT, 5683),
       (Option.PATH, '.well-known'),
       (Option.PATH, 'core')]

      [(Option.PATH_TYPE, PathType.ABSOLUTE_PATH),
       (Option.PATH, '.well-known'),
       (Option.PATH, 'core'),
       (Option.QUERY, 'rt=temperature-c')]

4.1.  Reference Resolution

   The following Python 3.6 code defines how to resolve
   starts with a "path" option is equivalent the same sequence of
   options that might be relative to prefixed
   by a given base.

      <CODE BEGINS>
      def resolve(base, href, relation=0):
        if not is_absolute(base) or not is_well_formed(href):
          return None
        result = []
        option = Option.FRAGMENT
        if len(href) != 0:
          option = href[0][0]
        if option == Option.HOST_IP:
          option = Option.HOST_NAME
        elif option == Option.PATH_TYPE:
          type = href[0][1]
          href = href[1:]
        elif option == Option.PATH:
          type = PathType.RELATIVE_PATH
          option = Option.PATH_TYPE
        if "path.type" option != Option.PATH_TYPE or type == PathType.ABSOLUTE_PATH:
          _copy_until(base, result, option)
        else:
          _copy_until(base, result, Option.QUERY)
          if type == PathType.APPEND_RELATION:
            _append_and_normalize(result, Option.PATH, str(relation))
          while type > PathType.APPEND_PATH:
            if len(result) == 0 or result[-1][0] != Option.PATH:
              break
            del result[-1]
            type -= 1
        _copy_until(href, result, Option._END)
        _append_and_normalize(result, Option._END, None)
        return result

      def _copy_until(input, output, end):
        for option, with value in input:
          if option >= end:
            break
          _append_and_normalize(output, option, value)

      def _append_and_normalize(output, option, value): 2.

   Examples:

      [0, "coap",
       2, h'C6336401',
       3, 61616,
       5, ".well-known",
       5, "core"]

      [4, 0,
       5, ".well-known",
       5, "core",
       6, "rt=temperature-c"]

   A CRI reference is considered _absolute_ if option > Option.PATH: the sequence of options
   starts with a "scheme" option.

   A CRI reference is considered _relative_ if len(output) >= 2 and \
              output[-1] == (Option.PATH, '') and (
              output[-2][0] < Option.PATH_TYPE the sequence of options
   is empty or (
              output[-2][0] == Option.PATH_TYPE and
              output[-2][1] == PathType.ABSOLUTE_PATH)):
            del output[-1]
          if starts with an option > Option.FRAGMENT:
            return
        output.append((option, value))
      <CODE ENDS>

4.2.  URI Recomposition other than a "scheme" option.

5.2.  Reference Resolution

   The following Python 3.6 code defines how to recompose term "relative" implies that a URI "base CRI" exists against which
   the relative reference is applied.  Aside from fragment-only
   references, relative references are only usable when a base CRI is
   known.

   The following steps define the process of resolving any CRI reference
   against a base CRI so that the result is a CRI in the form of an
   absolute sequence CRI reference:

   1.  Establish the base CRI of options.

      <CODE BEGINS>
      def recompose(href):
        if not is_absolute(href):
          return None
        result = ''
        no_path = True
        first_query = True
        for option, value the CRI reference and express it in href:
          if option == Option.SCHEME:
            result += value + ':'
          elif option == Option.HOST_NAME:
            result += '//' + _encode_reg_name(value)
          elif option == Option.HOST_IP:
            result += '//' + _encode_ip_address(value)
          elif option == Option.PORT:
            result += ':' + _encode_port(value)
          elif option == Option.PATH:
            result += '/' + _encode_path_segment(value)
            no_path = False
          elif the
       form of an absolute CRI reference.  The base CRI can be
       established in a number of ways; see Section 5.1 of [RFC3986].

   2.  Determine the values of two variables, T and E, depending on the
       first option == Option.QUERY:
            if no_path:
              result += '/'
              no_path = False
            result += '?' if first_query else '&'
            result += _encode_query_argument(value)
            first_query = False
          elif of the CRI reference to be resolved, according to
       Table 1.

    +---------------------+------------------+------------------------+
    | First Option Number | T                | E                      |
    +=====================+==================+========================+
    | 0 (scheme)          | 0                | 0                      |
    +---------------------+------------------+------------------------+
    | 1 (host.name)       | 0                | 1                      |
    +---------------------+------------------+------------------------+
    | 2 (host.ip)         | 0                | 1                      |
    +---------------------+------------------+------------------------+
    | 3 (port)            | (invalid sequence of options)             |
    +---------------------+------------------+------------------------+
    | 4 (path.type)       | option == Option.FRAGMENT:
            if no_path:
              result += '/'
              no_path = False
            result += '#' + _encode_fragment(value)
        if no_path:
          result += '/'
          no_path = False
        return result

      def _encode_reg_name(s):
        return ''.join(c value - 1 | if _is_reg_name_char(c) T < 0 then 5 else _encode_pct(c) for c in s)

      def _encode_ip_address(b):
        if len(b) == 4:
          return '.'.join(str(c) for c 6 |
    +---------------------+------------------+------------------------+
    | 5 (path)            | 1                | 6                      |
    +---------------------+------------------+------------------------+
    | 6 (query)           | 0                | 6                      |
    +---------------------+------------------+------------------------+
    | 7 (fragment)        | 0                | 7                      |
    +---------------------+------------------+------------------------+
    | none/empty sequence | 0                | 7                      |
    +---------------------+------------------+------------------------+

                  Table 1: Values of the Variables T and E

   3.  Initialize a buffer with all the options from the base CRI where
       the option number is less than the value of E.

   4.  If the value of T is greater than 0, remove the last T-many
       "path" options from the end of the buffer (up to the number of
       "path" options in b)
        elif len(b) == 16:
          return '[' + ... + ']'  # see RFC 5952

      def _encode_port(p):
         return str(p)

      def _encode_path_segment(s):
        return ''.join(c if _is_segment_char(c)
                         else _encode_pct(c) the buffer).

   5.  Append all the options from the CRI reference to the buffer,
       except for c in s)

      def _encode_query_argument(s):
        return ''.join(c if _is_query_char(c) any "path.type" option.

   6.  If the buffer contains a single "path" option and c not the value of
       that option is the zero-length string, remove that option from
       the buffer.

   7.  Return the sequence of options in '&'
                         else _encode_pct(c) for c the buffer.

6.  Relationship between CRIs, URIs and IRIs

   CRIs are meant to replace both Uniform Resource Identifiers (URIs)
   [RFC3986] and Internationalized Resource Identifiers (IRIs) [RFC3987]
   in s)

      def _encode_fragment(s):
        return ''.join(c if _is_fragment_char(c)
                         else _encode_pct(c) for c constrained environments [RFC7228].  Applications in s)

      def _encode_pct(s):
        return ''.join('%{0:0>2X}'.format(c) these
   environments may never need to use URIs and IRIs directly, especially
   when the resource identifier is used simply for c in s.encode('utf-8'))

      def _is_reg_name_char(c):
        return _is_unreserved(c) or _is_sub_delim(c)

      def _is_segment_char(c):
        return _is_pchar(c)

      def _is_query_char(c):
        return _is_pchar(c) or c in '/?'

      def _is_fragment_char(c):
        return _is_pchar(c) or c in '/?'

      def _is_pchar(c):
        return _is_unreserved(c) or _is_sub_delim(c) or c in ':@'

      def _is_unreserved(c):
        return _is_alpha(c) or _is_digit(c) or c in '-._~'

      def _is_alpha(c):
        return c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' + \
                    'abcdefghijklmnopqrstuvwxyz'

      def _is_digit(c):
        return c in '0123456789'

      def _is_sub_delim(c):
         return c in '!$&\'()*+,;='
      <CODE ENDS>

4.3. identification
   purposes or when the CRI can be directly converted into a CoAP Encoding
   request.

   However, it may be necessary in other environments to determine the
   associated URI or IRI of a CRI, and vice versa.  Applications can
   perform these conversions as follows:

   CRI to URI
      A CRI is converted to a URI as specified in Section 6.1.

   URI to CRI
      The method of converting a URI to a CRI is unspecified;
      implementations are free to use any algorithm as long as
      converting the resulting CRI back to a URI yields an equivalent
      URI.

   CRI to IRI
      A CRI can be converted to an IRI by first converting it to a URI,
      and then converting the URI to an IRI as described in Section 3.2
      of [RFC3987].

   IRI to CRI
      An IRI can be converted to a CRI by first converting it to a URI
      as described in Section 3.1 of [RFC3987], and then converting the
      URI to a CRI.

   Everything in this section also applies to CRI references, URI
   references and IRI references.

6.1.  Converting CRIs to URIs

   Applications MUST convert a CRI reference to a URI reference by
   determining the components of the URI reference according to the
   following Python 3.6 code illustrates how steps and then recomposing the components to construct CoAP
   options from a URI
   reference string as specified in Section 5.3 of [RFC3986].

   scheme
      If the CRI reference contains a "scheme" option, the scheme
      component of the URI reference consists of the value of that
      option.  Otherwise, the scheme component is undefined.

   authority
      If the CRI reference contains a "host.name" or "host.ip" option,
      the authority component consists of the host subcomponent,
      optionally followed by a colon (":") character and the port
      subcomponent.  Otherwise, the authority component is undefined.

      The host subcomponent consists of the value of the "host.name" or
      "host.ip" option.

      Any character in the value of a "host.name" option that is not in
      the set of unreserved characters (Section 2.3 of [RFC3986]) or
      "sub-delims" (Section 2.2 of [RFC3986]) MUST be percent-encoded.

      The value of a "host.ip" option MUST be represented as a string
      that matches the "IPv4address" or "IP-literal" rule (Section 3.2.2
      of [RFC3986]).

      If the CRI reference contains a "port" option, the port
      subcomponent consists of the value of that option in decimal
      notation.  Otherwise, the colon (":") character and the port
      subcomponent are both omitted.

   path
      If the CRI reference is an absolute empty sequence of options.  For simplicity, options or starts
      with a "port" option, a "path" option, or a "path.type" option
      where the value is not 0, the conversion fails.

      If the CRI reference contains a "host.name" option, a "host.ip"
      option or a "path.type" option where the value is not 0, the path
      component of the URI reference is prefixed by a slash ("/")
      character.  Otherwise, the path component is prefixed by the empty
      string.

      If the CRI reference contains one or more "path" options, the
      prefix is followed by the value of each option, separated by a
      slash ("/") character.

      Any character in the value of a "path" option that is not in the
      set of unreserved characters or "sub-delims" or a colon (":") or
      commercial at ("@") character MUST be percent-encoded.

      If the authority component is defined and the path component does
      not match the "path-abempty" rule (Section 3.3 of [RFC3986]), the
      conversion fails.

      If the authority component is undefined and the scheme component
      is defined and the path component does not match the "path-
      absolute", "path-rootless" or "path-empty" rule (Section 3.3 of
      [RFC3986]), the conversion fails.

      If the authority component is undefined and the scheme component
      is undefined and the path component does not match the "path-
      absolute", "path-noscheme" or "path-empty" rule (Section 3.3 of
      [RFC3986]), the conversion fails.

   query
      If the CRI reference contains one or more "query" options, the
      query component of the URI reference consists of the value of each
      option, separated by an ampersand ("&") character.  Otherwise, the
   code does not omit CoAP options with their default value.

      <CODE BEGINS>
      def coap(href, to_proxy=False):
        if
      query component is undefined.

      Any character in the value of a "query" option that is not is_absolute(href):
          return None
        result = b''
        previous = 0
        for in the
      set of unreserved characters or "sub-delims" or a colon (":"),
      commercial at ("@"), slash ("/") or question mark ("?") character
      MUST be percent-encoded.  Additionally, any ampersand character
      ("&") in the option value MUST be percent-encoded.

   fragment
      If the CRI reference contains a fragment option, the fragment
      component of the URI reference consists of the value of that
      option.  Otherwise, the fragment component is undefined.

      Any character in href:
          if option == Option.SCHEME:
            pass
          elif option == Option.HOST_NAME:
            opt = 3  # Uri-Host
            val = value.encode('utf-8')
            result += _encode_coap_option(opt - previous, val)
            previous = opt
          elif the value of a "fragment" option == Option.HOST_IP:
            opt = 3  # Uri-Host
            if len(value) == 4:
              val = '.'.join(str(c) for c that is not in value).encode('utf-8')
            elif len(value) == 16:
              val = b'[' + ... + b']'  # see RFC 5952
            result += _encode_coap_option(opt - previous, val)
            previous = opt
          elif option == Option.PORT:
            opt = 7  # Uri-Port
            val = value.to_bytes((value.bit_length() + 7) // 8, 'big')
            result += _encode_coap_option(opt - previous, val)
            previous = opt
          elif option == Option.PATH:
            opt = 11  # Uri-Path
            val = value.encode('utf-8')
            result += _encode_coap_option(opt - previous, val)
            previous = opt
          elif option == Option.QUERY:
            opt = 15  # Uri-Query
            val = value.encode('utf-8')
            result += _encode_coap_option(opt - previous, val)
            previous = opt
          elif option == Option.FRAGMENT:
            pass
        if to_proxy:
          (option, value) = href[0]
          opt = 39  # Proxy-Scheme
          val = value.encode('utf-8')
          result += _encode_coap_option(opt - previous, val)
          previous = opt
        return result

      def _encode_coap_option(delta, value):
        length = len(value)
        delta_nibble = _encode_coap_option_nibble(delta)
        length_nibble = _encode_coap_option_nibble(length)
        result = bytes([delta_nibble << 4 | length_nibble])
        if delta_nibble == 13:
          delta -= 13
          result += bytes([delta])
        elif delta_nibble == 14:
          delta -= 256 + 13
          result += bytes([delta >> 8, delta & 255])
        if length_nibble == 13:
          length -= 13
          result += bytes([length])
        elif length_nibble == 14:
          length -= 256 + 13
          result += bytes([length >> 8, length & 255])
        result += value
        return result

      def _encode_coap_option_nibble(n):
        if n < 13:
          return n
        elif n < 256 + 13:
          return 13
        elif n < 65536 + 256 + 13:
          return 14
      <CODE ENDS>

5.
      the set of unreserved characters or "sub-delims" or a colon (":"),
      commercial at ("@"), slash ("/") or question mark ("?") character
      MUST be percent-encoded.

7.  Security Considerations

   Parsers of CRI references must operate on input that is assumed to be
   untrusted.  This means that parsers MUST fail gracefully in the face
   of malicious inputs.  Additionally, parsers MUST be prepared to deal
   with resource exhaustion (e.g., resulting from the allocation of big
   data items) or exhaustion of the call stack (stack overflow).  See
   Section 8 10 of RFC
   7049 [RFC7049] [RFC7049bis] for additional security considerations
   relating to CBOR.

   The security considerations discussed in Section 7 of RFC 3986 [RFC3986] and
   Section 8 of [RFC3987] for URIs and IRIs also apply to Constrained Resource Identifiers.

6. CRIs.

8.  IANA Considerations

   This document has no IANA actions.

7.

9.  References

7.1.

9.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66,
              RFC 3986, DOI 10.17487/RFC3986, January 2005,
              <https://www.rfc-editor.org/info/rfc3986>.

   [RFC7049]

   [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
              Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987,
              January 2005, <https://www.rfc-editor.org/info/rfc3987>.

   [RFC7049bis]
              Bormann, C. and P. Hoffman, "Concise Binary Object
              Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
              October 2013, <https://www.rfc-editor.org/info/rfc7049>. Work in Progress, Internet-Draft,
              draft-ietf-cbor-7049bis-13, 8 March 2020,
              <https://tools.ietf.org/html/draft-ietf-cbor-7049bis-13>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC8610]  Birkholz, H., Vigano, C., and C. Bormann, "Concise Data
              Definition Language (CDDL): A Notational Convention to
              Express Concise Binary Object Representation (CBOR) and
              JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610,
              June 2019, <https://www.rfc-editor.org/info/rfc8610>.

7.2.

   [Unicode]  The Unicode Consortium, "The Unicode Standard, Version
              12.1.0", ISBN 978-1-936213-25-2, May 2019,
              <http://www.unicode.org/versions/Unicode12.1.0/>.

9.2.  Informative References

   [RFC5952]  Kawamura, S. and M. Kawashima, "A Recommendation for IPv6
              Address Text Representation", RFC 5952,
              DOI 10.17487/RFC5952, August 2010,
              <https://www.rfc-editor.org/info/rfc5952>.

   [RFC7228]  Bormann, C., Ersue, M., and A. Keranen, "Terminology for
              Constrained-Node Networks", RFC 7228,
              DOI 10.17487/RFC7228, May 2014,
              <https://www.rfc-editor.org/info/rfc7228>.

   [RFC7230]  Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
              Protocol (HTTP/1.1): Message Syntax and Routing",
              RFC 7230, DOI 10.17487/RFC7230, June 2014,
              <https://www.rfc-editor.org/info/rfc7230>.

   [RFC7252]  Shelby, Z., Hartke, K., and C. Bormann, "The Constrained
              Application Protocol (CoAP)", RFC 7252,
              DOI 10.17487/RFC7252, June 2014,
              <https://www.rfc-editor.org/info/rfc7252>.

   [RFC8288]  Nottingham, M., "Web Linking", RFC 8288,
              DOI 10.17487/RFC8288, October 2017,
              <https://www.rfc-editor.org/info/rfc8288>.

   [W3C.REC-html52-20171214]
              Faulkner, S., Eicholz, A., Leithead, T., Danilo, A., and
              S. Moon, "HTML 5.2", World Wide Web Consortium
              Recommendation REC-html52-20171214, 14 December 2017,
              <https://www.w3.org/TR/2017/REC-html52-20171214>.

Appendix A.  Change Log

   This section is to be removed before publishing as an RFC.

   Changes from -02 to -03:

   *  Expanded the set of supported schemes (#3).

   *  Specified creation, normalization and comparison (#9).

   *  Clarified the default value of the "path.type" option (#33).

   *  Removed the "append-relation" path type (#41).

   *  Renumbered the remaining path types.

   *  Renumbered the option numbers.

   *  Restructured the document.

   *  Minor editorial improvements.

   Changes from -01 to -02:

   *  Changed the syntax of schemes to exclude upper case characters. characters
      (#13).

   *  Minor editorial improvements. improvements (#34 #37).

   Changes from -00 to -01:

   *  None.

Acknowledgements

   Thanks to Christian Amsuess, Carsten Bormann, Ari Keranen, Jim Schaad, Schaad
   and Dave Thaler for helpful comments and discussions that have shaped
   the document.

Author's Address

   Klaus Hartke
   Ericsson
   Torshamnsgatan 23
   SE-16483 Stockholm
   Sweden

   Email: klaus.hartke@ericsson.com