[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 03 04 05 06 07 08 09 10 draft-ietf-lager-specification

Network Working Group                                          K. Davies
Internet-Draft                                                     ICANN
Intended status: Informational                                A. Freytag
Expires: January 10, 2014                                     ASMUS Inc.
                                                            July 9, 2013


            Representing Label Generation Rulesets using XML
                       draft-davies-idntables-03

Abstract

   This memo describes a method of representing the domain name
   registration policy for a zone administrator using Extensible Markup
   Language (XML).  These policies, known as "Label Generation Rulesets"
   (LGRs), are particularly used for the implementation of
   Internationalised Domain Names (IDNs).  The rulesets are used to
   implement and share policy on which specific Unicode codepoints are
   permitted for registrations, which alternative codepoints are
   considered variants, and what actions may be performed on those
   variants.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 10, 2014.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents



Davies & Freytag        Expires January 10, 2014                [Page 1]


Internet-Draft      Label Generation Rulesets in XML           July 2013


   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Design Goals . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Requirements . . . . . . . . . . . . . . . . . . . . . . . . .  5
   4.  LGR Format . . . . . . . . . . . . . . . . . . . . . . . . . .  6
     4.1.  Namespace  . . . . . . . . . . . . . . . . . . . . . . . .  6
     4.2.  Basic structure  . . . . . . . . . . . . . . . . . . . . .  6
     4.3.  Metadata . . . . . . . . . . . . . . . . . . . . . . . . .  6
       4.3.1.  The version element  . . . . . . . . . . . . . . . . .  7
       4.3.2.  The date element . . . . . . . . . . . . . . . . . . .  7
       4.3.3.  The language element . . . . . . . . . . . . . . . . .  7
       4.3.4.  The domain element . . . . . . . . . . . . . . . . . .  8
       4.3.5.  The description element  . . . . . . . . . . . . . . .  8
       4.3.6.  The validity-start and validity-end elements . . . . .  8
       4.3.7.  The unicode-version element  . . . . . . . . . . . . .  8
     4.4.  Codepoint Rules  . . . . . . . . . . . . . . . . . . . . .  9
       4.4.1.  Sequences  . . . . . . . . . . . . . . . . . . . . . .  9
       4.4.2.  Variants . . . . . . . . . . . . . . . . . . . . . . . 10
       4.4.3.  Result tagging . . . . . . . . . . . . . . . . . . . . 11
     4.5.  Whole Label Evaluation Rules . . . . . . . . . . . . . . . 12
       4.5.1.  Basic concepts . . . . . . . . . . . . . . . . . . . . 12
       4.5.2.  Character Classes  . . . . . . . . . . . . . . . . . . 12
       4.5.3.  Context rules  . . . . . . . . . . . . . . . . . . . . 14
       4.5.4.  Action elements  . . . . . . . . . . . . . . . . . . . 15
     4.6.  Example table  . . . . . . . . . . . . . . . . . . . . . . 16
   5.  Processing a label against an LGR  . . . . . . . . . . . . . . 18
     5.1.  Determining eligibility for a label  . . . . . . . . . . . 18
     5.2.  Determining variants for a label . . . . . . . . . . . . . 18
   6.  Conversion between other formats . . . . . . . . . . . . . . . 19
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 20
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 21
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
   Appendix A.  RelaxNG Schema  . . . . . . . . . . . . . . . . . . . 23
   Appendix B.  Acknowledgements  . . . . . . . . . . . . . . . . . . 28
   Appendix C.  Editorial Notes . . . . . . . . . . . . . . . . . . . 29
     C.1.  Known Issues and Future Work . . . . . . . . . . . . . . . 29
     C.2.  Sample tables and running code . . . . . . . . . . . . . . 29
     C.3.  Change History . . . . . . . . . . . . . . . . . . . . . . 29
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30




Davies & Freytag        Expires January 10, 2014                [Page 2]


Internet-Draft      Label Generation Rulesets in XML           July 2013


1.  Introduction

   This memo describes a method of using Extensible Markup Language
   (XML) to describe the algorithm used to determine whether a given
   domain label is permitted, and under which circumstances.  These
   algorithms are comprised of a list of permissible codepoints,
   variants, and a number of conditions where certain relationships are
   applied.  These algorithms form part of a zone administrator's
   policies, and can be referred to as Label Generation Rulesets (LGRs),
   or IDN tables.

   Administrators of the zones for top-level domain registries have
   historically published their LGRs using ASCII text or HTML.  The
   formatting of these documents has been loosely based on the format
   used for the Language Variant Table in [RFC3743].  [RFC4290] also
   provides a "model table format" that describes a similar set of
   functionality.

   Through the first decade of IDN deployment, experience has shown that
   LGRs derived from these formats are difficult to consistently
   implement and compare due to their different formats.  A universal
   format, such as one using a structured XML format, will assist by
   improving machine-readability, consistency, reusability and
   maintainability of LGRs.  It also provides for more complex
   conditional implementation of variants that reflects the known
   requirements of current zone administrator policies.

   While the predominant usage of this specification is to represent IDN
   label policy, the format may also be used for describing ASCII domain
   name label rulesets.





















Davies & Freytag        Expires January 10, 2014                [Page 3]


Internet-Draft      Label Generation Rulesets in XML           July 2013


2.  Design Goals

   The following items are explicit design goals of this format:

   o  MUST be in a format that can be implemented in a reasonably
      straightforward manner in software;

   o  The format SHOULD be able to be checked for formatting errors,
      such that common mistakes can be caught;

   o  An LGR MUST be able to express the set of valid codepoints that
      are allowed for registration under a specific zone administrator's
      policies;

   o  MUST be able to express computed alternatives to a given domain
      name based on a one-to-one, or one-to-many relationship.  These
      computed alternatives are commonly known as "variants";

   o  Variants SHOULD be able to be tagged with specific categories,
      such that the categories can be used to support registry policy
      (such as whether to list the computed variant in the zone, or to
      merely block it from registration);

   o  Variants MUST be able to stipulated based on contextual
      information.  For example, specific variants may only be
      applicable when they follow another specific codepoint, or when
      the codepoint is displayed in a specific presentation form;

   o  The data contained within an LGR MUST be unambiguous, such that
      independent implementations that utilise the contents will arrive
      at the same results;

   o  LGRs SHOULD be suitable for comparison and re-use, such that one
      could easily compare the contents of two or more to see the
      differences, to merge them, and so on.

   o  As many existing IDN tables are practicable SHOULD be able to be
      migrated to the LGR format with all applicable logic retained.

   It is explicitly NOT the goal of this format to stipulate what
   codepoints should be listed in an LGR by a zone administrator.  Which
   registration policies are used for a particular zone is outside the
   scope of this memo.








Davies & Freytag        Expires January 10, 2014                [Page 4]


Internet-Draft      Label Generation Rulesets in XML           July 2013


3.  Requirements

   To be able to fulfil the known utilisation of LGRs, the existing
   corpus of published IDN tables were reviewed to prepare this
   specification.

   In addition, the requirements of ICANN's work to implement an LGR for
   the DNS Root Zone [LGR-PROCEDURE] were also considered.  In Section B
   of that document, five specific requirements for an LGR methodology
   were identified:

   o  The ability to identify a set of codepoints that are permitted.

   o  The ability to represent a list of variants, if any, for each
      codepoint.

   o  A method of identifying codepoints that are related, using a tag.

   o  The ability to describe rules regarding the possible actions that
      may be performed on the resulting label (such as blocked,
      allocatable, etc.)

   o  The ability to describe rules that check for ill-formed
      combinations across the whole label.



























Davies & Freytag        Expires January 10, 2014                [Page 5]


Internet-Draft      Label Generation Rulesets in XML           July 2013


4.  LGR Format

   An LGR is expressed as a well-formed XML Document [XML].

4.1.  Namespace

   The XML Namespace URI is [TBD].

4.2.  Basic structure

   The basic XML framework of the document is as follows:

       <?xml version="1.0"?>
       <lgr xmlns="http://www.iana.org/lgr/0.1">
           ...
       </lgr>

   Within the "lgr" element rests several sub-elements.  Firstly is a
   "meta" element that contains all meta-data associated with the IDN
   table, such as its authorship, what it is used for, implementation
   notes and references.  This is followed by a "data" element that
   contains the substantive codepoint data.  Finally, an optional
   "rules" element contains information on whole-label evaluation rules,
   if any, along with any specific rules regarding the disposition of
   computed variants.

       <?xml version="1.0"?>
       <lgr xmlns="http://www.iana.org/lgr/0.1">
           <meta>
               ...
           </meta>
           <data>
               ...
           </data>
           <rules>
               ...
           </rules>
       </lgr>

   A document should contain exactly one "lgr" element, and within that
   optionally one "meta" element and exactly one "data" element.

4.3.  Metadata

   The "meta" element is used to express meta-data associated within the
   LGR.  It can be used to explain the author or relevant contact
   person, explain what the usage of the IDN table is, provide
   implementation notes as well as references.  The data contained



Davies & Freytag        Expires January 10, 2014                [Page 6]


Internet-Draft      Label Generation Rulesets in XML           July 2013


   within is not required by software consuming the LGR in order to
   calculate valid labels, or to calculate variants.

4.3.1.  The version element

   The "version" element is used to uniquely identify each version of
   the LGR being represented.  No specific format is required, but it is
   RECOMMENDED that it be a numerical positive integer, which is
   incremented with each revision of the file.

   An example of a typical first edition of a document:

       <version>1</version>

   A common alternative is to use a major-minor number scheme, where two
   decimal numbers are used to represent major and minor changes to the
   LGR.  For example, "1.0" would be the first major release, "1.1"
   would be a minor update to that, and "2.0" would represent a major
   revision.

4.3.2.  The date element

   The "date" element is used to identify the date the LGR was written.
   The contents of this element MUST be a valid ISO 8601 date string as
   described in [RFC3339].

   Example of a date:

       <date>2009-11-01</date>

4.3.3.  The language element

   The "language" element signals that the LGR is associated with a
   specific language or script.  The value of the language element must
   be a valid language tag as described in [RFC5646].  The tag may
   simply refer to a script if the LGR is not referring to a specific
   language.  There may be multiple language elements for a LGR if it
   spans multiple languages and/or scripts.

   Example of an English language LGR:

      <language>en</language>

   If the LGR applies to a specific script, rather than a language, the
   "und" language tag should be used followed by the relevant [RFC5646]
   script subtag.  For example, for a Cyrillic script LGR:

      <language>und-Cyrl</language>



Davies & Freytag        Expires January 10, 2014                [Page 7]


Internet-Draft      Label Generation Rulesets in XML           July 2013


4.3.4.  The domain element

   This optional element refers to a domain to which this policy is
   applied.

       <domain>example.com</domain>

   There may be multiple <domain> tags used to reflect a list of
   domains.

4.3.5.  The description element

   The "description" element is a free-form element that contains any
   additional relevant description.  Typically, this field contains
   authorship information, as well as additional context on how the LGR
   was formulated (such as with references), and how it has been
   applied.

   The element has an optional "type" attribute, which refers to the
   media type of the enclosed data.  If the description lacks a type
   field, it will be assumed to be plain text.

   The description elements describe information relating to the LGR
   that is useful for the user of the LGR in its interpretation.  This
   may explain the history, the rationale, reference sources etc.  It
   may also contain authorship information.

   The "type" attribute may be used to specify the encoding within
   description element.  The attribute should be a valid MIME type.  If
   supplied, it will be assumed the contents is content of that
   encoding.  Typical types would be "text/plain" or "text/html". "text/
   plain" will be assumed if no type attribute is specified.

4.3.6.  The validity-start and validity-end elements

   The "validity-start" and "validity-end" elements are optional
   elements that describe the time period from which the contents of the
   LGR become valid (i.e. are used in registry policy), and the contents
   of the LGR cease to be used.

   The times should conform to the format described in section 5.6 of
   [RFC5646].  It may be comprised of a date, or a date and time stamp.

4.3.7.  The unicode-version element

   If a given table is dependent on certain characters or functionality
   from a given version of the Unicode standard, the minimum version
   number MUST be listed.  If any software processing the table does not



Davies & Freytag        Expires January 10, 2014                [Page 8]


Internet-Draft      Label Generation Rulesets in XML           July 2013


   have the minimum requisite version, it MUST NOT perform any
   operations relating to whole-label evaluation.  This is because the
   Unicode properties for the codepoints may have changed in subsequent
   versions.

       <unicode-version>6.2</unicode-version>

4.4.  Codepoint Rules

   The bulk of a label generation ruleset is a description of which set
   of codepoints are eligible for a given label.  For rulesets that
   perform operations that result in potential variants, the codepoint-
   level relationships between variants need to also be described.

   The codepoint data is collected within a "data" element.  Within this
   element, a series of "char" and "range" elements describe eligible
   codepoints, or ranges of codepoints, respectively.

   Discrete permissible codepoints or codepoint sequences may be
   stipulated with a "char" element, e.g.

       <char cp="002D"/>

   Ranges of permissible codepoints may be stipulated with a "range"
   element, e.g.

       <range first-cp="0030" last-cp="0039"/>

   The range is inclusive of the first and last codepoints.

   Codepoints must be expressed in hexadecimal, i.e. according to the
   standard Unicode convention without the prefix "U+".  The rationale
   for not allowing other encoding formats, including native Unicode
   encoding in XML, is explored in [UAX42].  The XML conventions used in
   this format, including the element and attribute names, mirror this
   document where practical and reasonable to do so.

4.4.1.  Sequences

   A sequence of two or more codepoints may be specified in a LGR, when
   the exact sequence of codepoints is required to occur in order for
   the consituent elements to be eligible.  This approach allows
   representation of policy where a specific codepoint is only eligible
   when preceded or followed by another codepoint.  For example, in
   order to represent the eligibility of the MIDDLE DOT (U+00B7) only
   when both preceded and followed by the LATIN SMALL LETTER L (U+006C):

   <char cp="006C 00B7 006C"/>



Davies & Freytag        Expires January 10, 2014                [Page 9]


Internet-Draft      Label Generation Rulesets in XML           July 2013


4.4.2.  Variants

   While most LGRs typically only determine codepoint eligibility,
   others additionally specify a mapping of codepoints to other
   codepoints, known as "variants".  What constitutes a variant is a
   matter of policy, and varies for each implementation.

4.4.2.1.  Basic variants

   Variants are specified as one of more children of a "char" element.

   For example, to map LATIN SMALL LETTER V (U+0076) as a variant of
   LATIN SMALL LETTER U (U+0075):

   <char cp="0075">
     <var cp="0076"/>
   </char>

   A sequence of multiple codepoints can be specified as a variant of a
   single codepoint.  For example, the sequence of LATIN SMALL LETTER O
   (U+006F) then LATIN SMALL LETTER E (U+0065) can be specified as a
   variant for an LATIN SMALL LETTER O WITH DIAERESIS (U+00F6) as
   follows:

   <char cp="00F6">
     <var cp="006F 0065"/>
   </char>

   Variants are specified in only one direction.  For symmetric
   variants, the inverse of the variant must be explicitly specified:

   <char cp="006F 0065">
     <var cp="00F6"/>
   </char>

   Both the south and target of a variant mapping may be sequences.  It
   is not possible to specify variants for ranges.

4.4.2.2.  Null variants

   To specify a null variant, which is a variant string that maps to no
   codepoint, use an empty cp attribute.  For example, to mark a string
   with a ZERO WIDTH NON-JOINER (U+200C) to the same string without the
   ZERO WIDTH NON-JOINER:

   <char cp="200C">
     <var cp=""/>
   </char>



Davies & Freytag        Expires January 10, 2014               [Page 10]


Internet-Draft      Label Generation Rulesets in XML           July 2013


4.4.2.3.  Conditional variants

   Fundamentally, variants are mappings between two sequences of
   codepoints.  However, in some instances for a variant relationship to
   exist, some context external to the codepoint sequence must be
   considered.  For example, in some cases the positional context
   determines whether two code point sequences are variants of each
   other.  This is because Arabic characters can have different forms
   based on position.  This position context cannot be solely derived
   from the codepoint, as the code point is the same for the various
   forms.

   To specify a conditional variant relationship the "when" attribute is
   used.  The variant relationship exists when the condition in the
   "when" attribute is satisfied.

   arabic-initial  The codepoint is in a context where it would be
        presented in its Arabic Initial form.

   arabic-isolated  The codepoint is in a context where it would be
        presented in its Arabic Isolated form.

   arabic-medial  The codepoint is in a context where it would be
        presented in its Arabic Medial form.

   arabic-final  The codepoint is in a context where it would be
        presented in its Arabic Final form.

   For example, to mark ARABIC LETTER ALEF WITH WAVY HAMZA BELOW
   (U+0673) as a variant of ARABIC LETTER ALEF WITH HAMZA BELOW
   (U+0625), but only when it appears in isolated or final forms:

   <char cp="0625">
     <var cp="0673" when="arabic-isolated"/>
     <var cp="0673" when="arabic-final"/>
   </char>

   Only a single context attribute can be applied to any "var" element,
   however, multiple "var" elements using the same mapping, but
   different "when" attributes may be specified.

4.4.3.  Result tagging

   Typically, LGRs are used to explicitly designate allowable
   codepoints, with any label with a codepoint not explicitly listed in
   the LGR being considered an ineligible label according to the
   ruleset.




Davies & Freytag        Expires January 10, 2014               [Page 11]


Internet-Draft      Label Generation Rulesets in XML           July 2013


   For more complex registry rules, there may be a need to discern
   codepoints and variants of certain types.  This can be accomplished
   by applying a "tag" attribute, and then filtering on results based on
   the tag using whole label evaluation.

   A tag may be of any value, but the following tags are pre-defined to
   encourage common conventions in their application.  If these tags can
   represent registry policy, they SHOULD be used.

4.5.  Whole Label Evaluation Rules

4.5.1.  Basic concepts

   The codepoints in a label sometimes need to satisfy context-based
   rules, in order for the label to be considered valid.  Whole Label
   Evaluation Rules (WLE) can be specified to support this validation.
   The same validation can be applied to variants created by applying
   the variant mapping.

   The whole label evaluation rules are contained in an "wle" element,
   which contains character class, rule and action elements.  These are
   described below.

   A Whole Label Evaluation Rule describes a complete label.  The
   elements of the "rule" element are:

   o  character classes, which defines sets of codepoints to be used for
      context comparisons;

   o  context operators, which define when character classes may appear;
      and

   o  actions, which define what actions to take based on the context.

4.5.2.  Character Classes

   Character classes are named sets of characters that share a
   particular property.  They can be defined in several ways.

   1.  Define the property via matching a tag in the codepoint data.
       All characters with the same tag attribute are part of the same
       class.

   2.  Reference one of the Unicode character properties defined in the
       Unicode Character Database (UCD).

   3.  Explicitly list all the codepoints in the class.




Davies & Freytag        Expires January 10, 2014               [Page 12]


Internet-Draft      Label Generation Rulesets in XML           July 2013


   4.  Define a class as a combination of any number of these
       definitions or other classes

4.5.2.1.  Tag-based classes

   If tags are defined using the "tag" attribute, classes are defined
   based upon the names of the tags used.  From these classes, further
   operations may be performed by context operators and actions.

4.5.2.2.  Unicode property based classes

   A class is defined in terms of Unicode properties by giving the
   Unicode property alias and the property value or property value
   alias.

   <class name="virama" property="ccc:9">

   The example above selects all characters for which the Unicode
   canonical combining class (ccc) value is 9.  This value of the ccc is
   assigned in the UCD to all characters that are viramas.  The string
   "ccc" is the short-alias for the canonical combining class, as
   defined in the file PropertyAliases.txt in the UCD.  [[Possibly
   change those to the labels used by the XML format of the UCD -- per
   UAX42]]

   Unicode properties may, in principle, change between versions of the
   Unicode Standard.  However, the values assigned for a given version
   are fixed.  If Unicode Properties are used, they MUST be declared in
   the header, and the Unicode Version must be defined.  (Note, some
   Unicode properties are stable across versions and do not change, once
   assigned.  Nevertheless, in order to make sure the UCD version covers
   all the characters in the codepoint tables, it is necessary to give
   version number in the header.).

4.5.2.3.  Explicitly declared classes

   A class of codepoints may also be declared by listing the codepoints
   that are a member of the class.  This is useful when tagging can not
   be used because codepoints are not part of the eligible set of
   codepoints for the given LGR.

   To define a class in terms of an explicit list of codepoints:

   <class name="abc">
       <char cp="0061"/>
       <char cp="0062"/>
       <char cp="0063"/>
   </class>



Davies & Freytag        Expires January 10, 2014               [Page 13]


Internet-Draft      Label Generation Rulesets in XML           July 2013


   This defines a class named "abc" containing the codepoints for
   characters "a", "b" and "c".  The ordering of the codepoints is not
   material, but it is RECOMMENDED to list them in ascending order.

   Range operators may also be used to represent a series of consecutive
   codepoints.  The same declaration can be made as follows:

   <class name="abc">
       <range first-cp="0061" last-cp="0063"/>
   </class>

4.5.2.4.  Combined classes

   Classes may be combined using logical operators for inversion, union,
   intersection and exclusive-or.

   <not class="xxx">
   <union class="xxx" class="yyy">
   <not class="xxx">
   <difference class="xxx" class="yyy">
   <symmetric-difference class="xxx" class="yyy">

4.5.3.  Context rules

   Context rules are comprised of a series of logical conditions that
   must be satisfied in order to determine a label meets a given
   context.  These rules relate to the appearance of character classes
   defined elsewhere in the table.

4.5.3.1.  The rule element

   A matching rule is defined by a "rule" element, which combines
   character classes with context operators.

   A simple rule to match a label where all characters are members of
   the class "preferred":

   <rule name="preferred">
      <class name="preferred" count="1+"/>
   </rule>

   To provide more specificity on the number of times a specific
   character class may appear, the "count" attribute allows you to
   specify the number of times.  This number should be an integer of 0
   or higher.  If it is followed by a plus character (+), this means it
   can be higher that the number stated.  Therefore, "1" would mean
   exactly one occurrence, whereas "1+" would indicate one or more
   occurrences.



Davies & Freytag        Expires January 10, 2014               [Page 14]


Internet-Draft      Label Generation Rulesets in XML           July 2013


   For cases where several alternates could be chosen, the <choice>
   element can encode a list of choices:

   <rule name="ldh">
      <choice>
          <class name="letters"/>
          <class name="digits"/>
          <class name="hyphen"/>
      </choice>
   </rule>

   For cases when a match may occur against any codepoint, use any "any"
   element:

   <rule name="starting-digit">
      <class name="digits" count="1"/>
      <any/>
   </rule>

   By default Whole Label Evaluation Rules always match the entire
   label.  Use attribute "match" with values "start", "anywhere" and
   "end" to define rules that need to match in specific positions of the
   label.

   Rules are named and can be nested by reference.

   Here's an example of a rule requiring that all labels be letters
   (optionally followed by combining marks) and possibly digits.

   <class name="letter" property="gc:L"/>
   <class name="combining-mark" property="gc:M"/>
   <class name="digit" property="gc:Nd">
   <rule name="letter-grapheme">
      <class name="letter" count="1+"/>
      <class name="combining-mark" count="0+"/>
   </rule>
   <rule name="leading-letter>
      <class name="letter-grapheme" count="1"/>
      <choice count="0+">
          <class name="letter-grapheme" count="0+"/>
          <class name="digit" count="0+"/>
      </choice>
   </rule>

4.5.4.  Action elements

   The purpose of a rule is to trigger a specific action.  Often, the
   action simply results in blocking a label that does not match a rule.



Davies & Freytag        Expires January 10, 2014               [Page 15]


Internet-Draft      Label Generation Rulesets in XML           July 2013


   blocking rule
   <action action="blocked" not-match="leading-letter"/><

   An action may contain precisely one "match" or "not-match" attribute,
   but not both.  Because rules may be compound rules that contain other
   rules, only a single rule may be named as the value of the "match" or
   "not-match" attrbute.

   The precise action taken and the name of the corresponding "action"
   attribute are not defined here.  It is strongly RECOMMENDED to use
   the following actions only with their conventional sense.

   block  The resulting string should be blocked from registration.
        This would typically apply for a derived variant that has no
        practical use, such as blocking confusingly similar by
        undesirable variants.

   allocate  The resulting string should be reserved for use by the same
        operator of the origin string, but not automatically allocated
        for use.

   activate  The resulting string should be activated for use.  (This is
        the typical default action if no tagging is used, and is known
        as a "preferred" variant in [RFC3743])

4.6.  Example table

   A sample complete XML LGR is as follows.

       <?xml version="1.0"?>
       <lgr xmlns="http://www.iana.org/lgr/0.1">
           <meta>
                <version>1</version>
                <date>2010-01-01</date>
                <language>sv</language>
                <domain>example</domain>
                <description type="text/html">
                <![CDATA[
                    This language table was developed with the
                    <a href="http://swedish.example/">Swedish
                    examples institute</a>.
                ]]>
                </description>
           </meta>
           <data>
               <range first-cp="0061" last-cp="007A"/>
               <char cp="00E4"/>
           </data>



Davies & Freytag        Expires January 10, 2014               [Page 16]


Internet-Draft      Label Generation Rulesets in XML           July 2013


       </lgr>


















































Davies & Freytag        Expires January 10, 2014               [Page 17]


Internet-Draft      Label Generation Rulesets in XML           July 2013


5.  Processing a label against an LGR

5.1.  Determining eligibility for a label

   In order to use a table to test a specific domain label for
   membership in the LGR, a consumer of the LGR must iterate through
   each codepoint within a given U-label, and test that each codepoint
   is a member of the LGR.  If any codepoint is not a member of the LGR,
   it shall be deemed as not eligible in accordance with the table.

   A codepoint is deemed a member of the table when it is listed with
   the <char> element, and all necessary condition listed in "when"
   attributes are correctly satisfied.

5.2.  Determining variants for a label

   For a given eligible label, the set of variants is deemed to be each
   possible permutation of <var> elements, whereby all "when" attributes
   are correctly satisfied for each codepoint in the given permutation.
































Davies & Freytag        Expires January 10, 2014               [Page 18]


Internet-Draft      Label Generation Rulesets in XML           July 2013


6.  Conversion between other formats

   Both [RFC3743] and [RFC4290] provide different grammars for IDN
   tables.  These formats are unable to fully cater for the increased
   requirements of contemporary IDN variant policies.

   This specification is a superset of functionality provided by these
   IDN table formats, thus any table expressed in those formats can be
   expressed in this format.  Automated conversion can be conducted
   between tables conformant with the grammar specified in each
   document.








































Davies & Freytag        Expires January 10, 2014               [Page 19]


Internet-Draft      Label Generation Rulesets in XML           July 2013


7.  IANA Considerations

   This document does not specify any IANA actions.
















































Davies & Freytag        Expires January 10, 2014               [Page 20]


Internet-Draft      Label Generation Rulesets in XML           July 2013


8.  Security Considerations

   There are no security considerations for this memo.
















































Davies & Freytag        Expires January 10, 2014               [Page 21]


Internet-Draft      Label Generation Rulesets in XML           July 2013


9.  References

   [LGR-PROCEDURE]
              Internet Corporation for Assigned Names and Numbers,
              "Procedure to Develop and Maintain the Label Generation
              Rules for the Root Zone in Respect of IDNA Labels".

   [RFC3339]  Klyne, G., Ed. and C. Newman, "Date and Time on the
              Internet: Timestamps", RFC 3339, July 2002.

   [RFC3743]  Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
              Engineering Team (JET) Guidelines for Internationalized
              Domain Names (IDN) Registration and Administration for
              Chinese, Japanese, and Korean", RFC 3743, April 2004.

   [RFC4290]  Klensin, J., "Suggested Practices for Registration of
              Internationalized Domain Names (IDN)", RFC 4290,
              December 2005.

   [RFC5564]  El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman,
              "Linguistic Guidelines for the Use of the Arabic Language
              in Internet Domains", RFC 5564, February 2010.

   [RFC5646]  Phillips, A. and M. Davis, "Tags for Identifying
              Languages", BCP 47, RFC 5646, September 2009.

   [UAX42]    Unicode Consortium, "Unicode Character Database in XML".

   [XML]      "Extensible Markup Language (XML) 1.0".






















Davies & Freytag        Expires January 10, 2014               [Page 22]


Internet-Draft      Label Generation Rulesets in XML           July 2013


Appendix A.  RelaxNG Schema


   <?xml version="1.0" encoding="UTF-8"?>
   <grammar ns="http://www.iana.org/lgr/0.1"
     xmlns="http://relaxng.org/ns/structure/1.0">
     <define name="language-tag">
       <text/>
     </define>
     <define name="domain-name">
       <text/>
     </define>
     <define name="code-point">
       <text/>
     </define>
     <define name="variant-condition">
       <text/>
     </define>
     <define name="tag">
       <text/>
     </define>
     <define name="point-single">
       <element name="char">
         <attribute name="cp">
           <ref name="code-point"/>
         </attribute>
         <attribute name="tag">
           <ref name="tag"/>
         </attribute>
         <zeroOrMore>
           <ref name="point-variant"/>
         </zeroOrMore>
       </element>
     </define>
     <!-- Representation of a code point variant -->
     <define name="point-variant">
       <element name="var">
         <attribute name="cp">
           <ref name="code-point"/>
         </attribute>
         <optional>
           <attribute name="type"/>
         </optional>
         <optional>
           <attribute name="when">
             <ref name="variant-condition"/>
           </attribute>
         </optional>



Davies & Freytag        Expires January 10, 2014               [Page 23]


Internet-Draft      Label Generation Rulesets in XML           July 2013


       </element>
     </define>
     <define name="point-multiple">
       <element name="range">
         <attribute name="first-cp">
           <ref name="code-point"/>
         </attribute>
         <attribute name="last-cp">
           <ref name="code-point"/>
         </attribute>
         <text/>
       </element>
     </define>
     <define name="points">
       <oneOrMore>
         <choice>
           <ref name="point-single"/>
           <ref name="point-multiple"/>
         </choice>
       </oneOrMore>
     </define>
     <define name="any">
       <element name="any">
         <optional>
           <attribute name="count"/>
         </optional>
       </element>
     </define>
     <define name="class">
       <element name="class">
         <optional>
           <attribute name="count"/>
         </optional>
         <optional>
           <attribute name="name"/>
         </optional>
         <zeroOrMore>
           <ref name="points"/>
         </zeroOrMore>
       </element>
     </define>
     <define name="choice">
       <element name="choice">
         <optional>
           <attribute name="count"/>
         </optional>
         <oneOrMore>
           <ref name="class-matchers"/>



Davies & Freytag        Expires January 10, 2014               [Page 24]


Internet-Draft      Label Generation Rulesets in XML           July 2013


         </oneOrMore>
       </element>
     </define>
     <define name="class-matchers">
       <oneOrMore>
         <choice>
           <ref name="class"/>
           <ref name="any"/>
           <ref name="choice"/>
         </choice>
       </oneOrMore>
     </define>
     <define name="rules-declaration">
       <element name="rule">
         <attribute name="name"/>
         <oneOrMore>
           <ref name="class-matchers"/>
         </oneOrMore>
       </element>
     </define>
     <define name="action-declaration">
       <element name="action">
         <attribute name="action"/>
         <choice>
           <attribute name="match"/>
           <attribute name="not-match"/>
         </choice>
       </element>
     </define>
     <start>
       <ref name="lgr"/>
     </start>
     <define name="lgr">
       <element name="lgr">
         <attribute name="id"/>
         <optional>
           <ref name="meta-section"/>
         </optional>
         <ref name="data-section"/>
         <optional>
           <ref name="rules-section"/>
         </optional>
       </element>
     </define>
     <define name="meta-section">
       <element name="meta">
         <zeroOrMore>
           <choice>



Davies & Freytag        Expires January 10, 2014               [Page 25]


Internet-Draft      Label Generation Rulesets in XML           July 2013


             <optional>
               <element name="version">
                 <text/>
               </element>
             </optional>
             <optional>
               <element name="date">
                 <text/>
               </element>
             </optional>
             <zeroOrMore>
               <element name="language">
                 <ref name="language-tag"/>
               </element>
             </zeroOrMore>
             <zeroOrMore>
               <element name="domain">
                 <ref name="domain-name"/>
               </element>
             </zeroOrMore>
             <optional>
               <element name="validity-start">
                 <text/>
               </element>
             </optional>
             <optional>
               <element name="validity-end">
                 <text/>
               </element>
             </optional>
             <optional>
               <element name="unicode-version">
                 <text/>
               </element>
             </optional>
             <zeroOrMore>
               <element name="description">
                 <attribute name="type"/>
                 <text/>
               </element>
             </zeroOrMore>
           </choice>
         </zeroOrMore>
       </element>
     </define>
     <define name="data-section">
       <element name="data">
         <ref name="points"/>



Davies & Freytag        Expires January 10, 2014               [Page 26]


Internet-Draft      Label Generation Rulesets in XML           July 2013


       </element>
     </define>
     <define name="rules-section">
       <element name="rules">
         <zeroOrMore>
           <choice>
             <ref name="rule-declaration"/>
             <ref name="action-declaration"/>
           </choice>
         </zeroOrMore>
       </element>
     </define>
   </grammar>






































Davies & Freytag        Expires January 10, 2014               [Page 27]


Internet-Draft      Label Generation Rulesets in XML           July 2013


Appendix B.  Acknowledgements

   This format builds upon the work on documenting IDN tables by many
   different registry operators.  Notably, a comprehensive language
   table for Chinese, Japanese and Korean was developed by the "Joint
   Engineering Team" [RFC3743] that is the basis of many registry
   policies; and a set of guidelines for Arabic script registrations
   [RFC5564] was published by the Arabic-language community.

   Contributions that have shaped this document have been provided by
   Francisco Arias, Mark Davis, Nicholas Ostler, Thomas Roessler, Steve
   Sheng and Andrew Sullivan.







































Davies & Freytag        Expires January 10, 2014               [Page 28]


Internet-Draft      Label Generation Rulesets in XML           July 2013


Appendix C.  Editorial Notes

   This appendix to be removed prior to final publication.

C.1.  Known Issues and Future Work

   o  A default set of actions should be defined if they are not
      explicitly accounted for in the table.

   o  A method of specifying the origin URI for a table, and an
      expiration or refresh policy, as meta-data may be a useful way to
      declare how the table will be updated.

C.2.  Sample tables and running code

   Some sample tables using this format, as well as a basic
   implementation of this specification, is posted at
   https://github.com/kjd/idntables

C.3.  Change History

   -00  Initial draft.

   -01  Add an XML Namespace, and fix other XML nits.  Add support for
        sequences of codepoints.  Improve on consistently using Unicode
        nomenclature.

   -02  Add support for validity periods.

   -03  Incorporate requirements from the Label Generation Ruleset
        Procedure for the DNS Root Zone.  These requirements include a
        detailed grammar for specifying whole-label variants, and the
        ability to explicitly declare of the actions associated with a
        specific variant.  The document also consistently applies the
        term "Label Generation Ruleset", rather than "IDN table", to
        reflect the policy term now being used to describe these.















Davies & Freytag        Expires January 10, 2014               [Page 29]


Internet-Draft      Label Generation Rulesets in XML           July 2013


Authors' Addresses

   Kim Davies
   Internet Corporation for Assigned Names and Numbers
   12025 Waterfront Drive
   Los Angeles, CA  90094
   US

   Phone: +1 310 301 5800
   Email: kim.davies@icann.org
   URI:   http://www.iana.org/


   Asmus Freytag
   ASMUS Inc.

   Email: asmus@unicode.org


































Davies & Freytag        Expires January 10, 2014               [Page 30]


Html markup produced by rfcmarkup 1.129c, available from https://tools.ietf.org/tools/rfcmarkup/