[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 02 03 04 RFC 5564

Network Working Group                                     A. El-Sherbiny
Internet-Draft                                                  M. Farah
Intended status: Informational                                  UN-ESCWA
Expires: April 24, 2009                                      I. Oueichek
                                            Syrian Telecom Establishment
                                                             A. Al-Zoman
                                                          SaudiNIC, CITC
                                                        October 21, 2008


   Linguistic Guidelines for the Use of Arabic Characters in Internet
                                Domains
               draft-farah-adntf-ling-guidelines-02.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 24, 2009.

Abstract

   This document constitutes technical specifications for the use of
   Arabic characters in Internet Domain names and provides linguistic
   guidelines for Arabic Domain Names.  It addresses Arabic-specific
   linguistic issues pertaining to the use of Arabic language in domain
   names.





El-Sherbiny, et al.       Expires April 24, 2009                [Page 1]

Internet-Draft         Arabic Character Guidelines          October 2008


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Arabic Language-Specific Issues  . . . . . . . . . . . . . . .  4
     2.1.  Linguistic Issues  . . . . . . . . . . . . . . . . . . . .  4
       2.1.1.  Diacritics (tashkeel) and Shadda . . . . . . . . . . .  5
       2.1.2.  Kasheeda or Tatweel (Horizontal Character Size
               Extension) . . . . . . . . . . . . . . . . . . . . . .  5
       2.1.3.  Character Folding  . . . . . . . . . . . . . . . . . .  5
     2.2.  Supported Character Set  . . . . . . . . . . . . . . . . .  6
     2.3.  Arabic Linguistic Issues Affected By Technical
           Constraints  . . . . . . . . . . . . . . . . . . . . . . .  8
       2.3.1.  Numerals . . . . . . . . . . . . . . . . . . . . . . .  8
       2.3.2.  The Space Character  . . . . . . . . . . . . . . . . .  9
   3.  Summary and Conclusion . . . . . . . . . . . . . . . . . . . .  9
   4.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   6.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . .  9
   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
     7.1.  Normative References . . . . . . . . . . . . . . . . . . . 10
     7.2.  Informative References . . . . . . . . . . . . . . . . . . 10
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
   Intellectual Property and Copyright Statements . . . . . . . . . . 12




























El-Sherbiny, et al.       Expires April 24, 2009                [Page 2]

Internet-Draft         Arabic Character Guidelines          October 2008


1.  Introduction

   The Internet Engineering Task Force (IETF) issued in March 2003 a set
   of RFCs for Internationalized Domain Names (IDN) [1],[2], [3] which
   were planned to become the de facto standard for all languages.  In
   2007 and 2008, new versions of the internet-drafts proposing the
   revisions to the IDNA protocol have been released and are as follows:

   o  Internationalizing Domain Names for Applications (IDNA): Issues
      and Rationale [5]

   o  Internationalizing Domain Names in Applications (IDNA): Protocol
      [6]

   o  An IDNA problem in right-to-left scripts [7]

   o  The Unicode Codepoints and IDN [8]

   Those documents are known collectively as "IDNA2008".

   This document constitutes a technical specification for the
   implementation of the IDN standards in the case of the Arabic
   Language.  It will allow the use of standard language tables to write
   domain names in Arabic characters.  Therefore, it should be
   considered as a logical extension to the IDN standards.  It thus
   presents guidelines for the proper use of Arabic characters with the
   IDN standards.

   This document reflects the recommendations of the Arab Working Group
   on Arabic Domain Names (AWG-ADN) established by the League of Arab
   States (LAS), based on standardisation efforts of the United Nations
   Economic and Social Commission for Western Asia (UN-ESCWA) and its
   Internet- Draft, "Guidelines for an Arabic Internet Domain Name" [9].
   It is also in full harmony with recent rigorous discussions that took
   place with the major language communities that also use the Arabic
   script in their languages.

   This document provides guidelines for the ways Arabic characters may
   be used for registering Internet Domain Names and how linguistic
   specific issues should be handled.  A few rules are recommended for
   application at the protocol level.

   The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY"
   in this document are to be interpreted as described in RFC 2119 [4].

   Comments on this document are solicited and should be addressed to
   the working group's mailing list at ESCWA-ICTD@un.org and/or the
   author(s).



El-Sherbiny, et al.       Expires April 24, 2009                [Page 3]

Internet-Draft         Arabic Character Guidelines          October 2008


2.  Arabic Language-Specific Issues

   The main objective of the creation of Arabic Domain Names is to have
   a vehicle to increase Internet use amongst all strata of the Arabic-
   speaking communities.

   Furthermore, a non-user friendly Domain Name would further add to the
   ambiguity and the eccentricity of the Internet to the Arabic-speaking
   communities, thus contributing negatively to the spread of the
   Internet and leading to further isolation of these communities at the
   global level.

   Hence, there have been intensive efforts especially those spearheaded
   by Dr. Al-Zoman and contributed to by UN-ESCWA and its Arabic Domain
   Names Task Force (ADN-TF) to reach consensus on a multitude of
   linguistic issues with the following goals:

   o  To define the accepted Arabic character set to be used for writing
      domain names in Arabic; which is the subject of this document.

   o  To define the top-level domains of the Arabic domain name tree
      structure (i.e., Arabic gTLDs and ccTLDs).  This goal will be
      handled in a separate document.

   The first meeting of the AWG-ADN, held in Damascus January-February
   2005, gave special attention to the following:

   a.  Simplification of the domain names, whenever possible, to
       facilitate the interaction of the Arabic user with the Internet.

   b.  Adoption of solutions that do not lead to confusion either in
       reading or in writing, provided that this does not compromise the
       linguistic correctness of used words.

   c.  Mixing Arabic and non-Arabic letters in the domain name label is
       not acceptable.

2.1.  Linguistic Issues

   There are a number of linguistic issues that have been proposed with
   respect to the use of the Arabic language in domain names.  This
   section will highlight some of them.  This section is based on the
   papers of Dr. Al-Zoman [10] [11] and the report of the first meeting
   of AWG-ADN [12].  For details the reader is encouraged to review the
   references.






El-Sherbiny, et al.       Expires April 24, 2009                [Page 4]

Internet-Draft         Arabic Character Guidelines          October 2008


2.1.1.  Diacritics (tashkeel) and Shadda

   Tashkeel and Shadda are accent marks placed above or below Arabic
   letters to produce proper pronunciation.  They are thus used to
   differentiate different meanings for different words with the same
   base characters.

   Consistent with the IDNA2008 proposals, neither Tashkeel nor Shadda
   are permitted in zone files.  They can be supported or ignored, if
   necessary, in the user interface with local mappings and stripped
   before IDNA processing.

   The following are their Unicode presentations:
   U+064B ARABIC FATHATAN
   U+064C ARABIC DAMMATAN
   U+064D ARABIC KASRATAN
   U+064E ARABIC FATHA
   U+064F ARABIC DAMMA
   U+0650 ARABIC KASRA
   U+0651 ARABIC SHADDA
   U+0652 ARABIC SUKUN

2.1.2.  Kasheeda or Tatweel (Horizontal Character Size Extension)

   Kasheeda (U+0640 ARABIC TATWEEL) must not be used in Arabic domain
   names and should be disallowed at the protocol level.

2.1.3.  Character Folding

   Character folding is the process where multiple letters (that may
   have some similarity with respect to their shapes) are folded into
   one shape.  Examples of such Arabic characters include:

   o  Folding Teh Marbuta (U+0629) and Heh (U+0647) at the end of a
      word;

   o  Folding different forms of Hamzah (U+0622, U+0623, U+0625,
      U+0627);

   o  Folding Alef Maksura (U+0649) and Yeh (U+064A) at the end of a
      word;

   o  Folding Waw with Hamzah Above (U+0624) and Waw (U+0648).








El-Sherbiny, et al.       Expires April 24, 2009                [Page 5]

Internet-Draft         Arabic Character Guidelines          October 2008


   With respect to the Arabic language, character folding is not
   acceptable because it changes the meaning of words and it is against
   the principle of spelling rules.  Replacing a character valid for use
   in domain names with another character also valid for use in domain
   names, which may have a similar shape, will give a different meaning.
   This will lead to have only one word representing several words
   consisting of all the combinations of folded characters.  Hence, the
   other words will be masked by a single word [10].

   Mis-spelling or handwriting errors do occur leading to mixing
   different characters despite the fact that this is not the case in
   published and printed materials.  One of the motivations of this
   effort is to preserve the language particularly with the spread of
   the globalization movement.  Within this context, character folding
   is working against this motivation since it is going to have a
   negative affect on the principle and ethics of the language.
   Technology should work for preserving the language and not for
   destroying it.  Thus, character folding should not be allowed.  The
   case of digits is treated in a separate section below.

2.2.  Supported Character Set

   A domain name to be written in Arabic must be composed of a sequence
   of the following UNICODE characters.  These are based on UNICODE
   version 5.0.  The tables below are constructed using an inclusion-
   based approach.  Thus, characters that are not part of the table are
   prohibited.

             +---------+-------------------------------------+
             | Unicode | Character Name                      |
             +---------+-------------------------------------+
             | 0621    | ARABIC LETTER HAMZA                 |
             | 0622    | ARABIC LETTER ALEF WITH MADDA ABOVE |
             | 0623    | ARABIC LETTER ALEF WITH HAMZA ABOVE |
             | 0624    | ARABIC LETTER WAW WITH HAMZA ABOVE  |
             | 0625    | ARABIC LETTER ALEF WITH HAMZA BELOW |
             | 0626    | ARABIC LETTER YEH WITH HAMZA ABOVE  |
             | 0627    | ARABIC LETTER ALEF                  |
             | 0628    | ARABIC LETTER BEH                   |
             | 0629    | ARABIC LETTER TEH MARBUTA           |
             | 062A    | ARABIC LETTER TEH                   |
             | 062B    | ARABIC LETTER THEH                  |
             | 062C    | ARABIC LETTER JEEM                  |
             | 062D    | ARABIC LETTER HAH                   |
             | 062E    | ARABIC LETTER KHAH                  |
             | 062F    | ARABIC LETTER DAL                   |
             | 0630    | ARABIC LETTER THAL                  |
             | 0631    | ARABIC LETTER REH                   |



El-Sherbiny, et al.       Expires April 24, 2009                [Page 6]

Internet-Draft         Arabic Character Guidelines          October 2008


             | 0632    | ARABIC LETTER ZAIN                  |
             | 0633    | ARABIC LETTER SEEN                  |
             | 0634    | ARABIC LETTER SHEEN                 |
             | 0635    | ARABIC LETTER SAD                   |
             | 0636    | ARABIC LETTER DAD                   |
             | 0637    | ARABIC LETTER TAH                   |
             | 0638    | ARABIC LETTER ZAH                   |
             | 0639    | ARABIC LETTER AIN                   |
             | 063A    | ARABIC LETTER GHAIN                 |
             | 0641    | ARABIC LETTER FEH                   |
             | 0642    | ARABIC LETTER QAF                   |
             | 0643    | ARABIC LETTER KAF                   |
             | 0644    | ARABIC LETTER LAM                   |
             | 0645    | ARABIC LETTER MEEM                  |
             | 0646    | ARABIC LETTER NOON                  |
             | 0647    | ARABIC LETTER HEH                   |
             | 0648    | ARABIC LETTER WAW                   |
             | 0649    | ARABIC LETTER ALEF MAKSURA          |
             | 064A    | ARABIC LETTER YEH                   |
             | 0660    | ARABIC-INDIC DIGIT ZERO             |
             | 0661    | ARABIC-INDIC DIGIT ONE              |
             | 0662    | ARABIC-INDIC DIGIT TWO              |
             | 0663    | ARABIC-INDIC DIGIT THREE            |
             | 0664    | ARABIC-INDIC DIGIT FOUR             |
             | 0665    | ARABIC-INDIC DIGIT FIVE             |
             | 0666    | ARABIC-INDIC DIGIT SIX              |
             | 0667    | ARABIC-INDIC DIGIT SEVEN            |
             | 0668    | ARABIC-INDIC DIGIT EIGHT            |
             | 0669    | ARABIC-INDIC DIGIT NINE             |
             +---------+-------------------------------------+

        Source: Supporting the Arabic Language in Domain Names [10]

         Table 1: CHARACTERS FROM UNICODE ARABIC TABLE (0600-06FF)

















El-Sherbiny, et al.       Expires April 24, 2009                [Page 7]

Internet-Draft         Arabic Character Guidelines          October 2008


                       +---------+-----------------+
                       | Unicode | Digit Name      |
                       +---------+-----------------+
                       | 0030    | DIGIT ZERO      |
                       | 0031    | DIGIT ONE       |
                       | 0032    | DIGIT TWO       |
                       | 0033    | DIGIT THREE     |
                       | 0034    | DIGIT FOUR      |
                       | 0035    | DIGIT FIVE      |
                       | 0036    | DIGIT SIX       |
                       | 0037    | DIGIT SEVEN     |
                       | 0038    | DIGIT EIGHT     |
                       | 0039    | DIGIT NINE      |
                       | 002D    | HYPHEN-MINUS    |
                       | 002E    | FULL STOP (Dot) |
                       +---------+-----------------+

        Source: Supporting the Arabic Language in Domain Names [11]

      Table 2: CHARACTERS FROM UNICODE BASIC LATIN TABLE (0000-007F)

2.3.  Arabic Linguistic Issues Affected By Technical Constraints

   In this section, technical aspects of some linguistic issues are
   discussed.

2.3.1.  Numerals

   In the Arab countries, there are two sets of numerical digits used:

   o  Set I: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) mostly used in the western
      part of the Arab world.

   o  Set II: (u+0660, u+0661, u+0662, u+0663, u+0664, u+0665, u+0666,
      u+0667, u+0668, u+0669) mostly used in the eastern part of the
      Arab world.

   Although visual differentiation between the Arabic zero (u+0660) and
   the dot (u+002E) in printed material is possible (the zero is larger
   in size and is printed higher than the dot), using it in domain names
   may lead to confusion.  Folding set II to set I will eliminate the
   problem of the zero, in specific, and that of numerals in general.

   Both sets may be supported in the user interface but both must be
   folded to one set (Set I) at the preparation of internationalized
   strings (e.g., "stringprep") phase; i.e. storage of numerals in the
   zone file is done in ASCII format.




El-Sherbiny, et al.       Expires April 24, 2009                [Page 8]

Internet-Draft         Arabic Character Guidelines          October 2008


2.3.2.  The Space Character

   The space character is strictly disallowed in domain names, as it is
   a control character.  Instead, the hyphen (Al-sharta) (i.e.u+02D) is
   proposed as a separator between Arabic words to avoid confusion that
   can take place if the words are typed without a separator, unlike in
   ASCII.

   It is acceptable to use the hyphen to separate between words within
   the same domain name label.


3.  Summary and Conclusion

   The proposed guidelines are in full accordance with the IETF IDN
   standards and take into account Arabic language-specific issues
   within a compromise between grammatical rules of the Arabic language
   and the ease of use of the language on the Internet.


4.  Security Considerations

   No particular security considerations could be identified regarding
   the use of Arabic characters in writing domain names.  In particular,
   any potential visual confusion between different character strings is
   avoided using the guidelines proposed in this document.


5.  IANA Considerations

   This document has no action for IANA.


6.  Acknowledgments

   ESCWA ICT Division provided support and funding for the development
   of this document with the objective of reaching a standard for a
   comprehensive Arabic Domain Names.  Thanks are due to SaudiNIC for
   its continuous efforts in supporting the development of Arabic Domain
   Names.

   John Klensin provided some editing help with the document.









El-Sherbiny, et al.       Expires April 24, 2009                [Page 9]

Internet-Draft         Arabic Character Guidelines          October 2008

7.  References

7.1.  Normative References

   [1]   Faltstrom, P., Hoffman, P., and A. Costello,
         "Internationalizing Domain Names in Applications (IDNA)",
         RFC 3490, March 2003.

   [2]   Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile
         for Internationalized Domain Names (IDN)", RFC 3491,
         March 2003.

   [3]   Costello, A., "Punycode: A Bootstring encoding of Unicode for
         Internationalized Domain Names in Applications (IDNA)",
         RFC 3492, March 2003.

   [4]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, March 1997.

7.2.  Informative References

   [5]   Klensin, J., "Internationalized Domain Names for Applications
         (IDNA): Definitions,  Background and Rationale",
         draft-ietf-idnabis-rationale-02 (work in progress),
         September 2008.

   [6]   Klensin, J., "Internationalized Domain Names in Applications
         (IDNA): Protocol", draft-ietf-idnabis-protocol-05 (work in
         progress), September 2008.

   [7]   Alvestrand, H. and C. Karp, "An updated IDNA criterion for
         right-to-left scripts", draft-ietf-idnabis-bidi-02 (work in
         progress), July 2008.

   [8]   Faltstrom, P., "The Unicode Codepoints and IDNA",
         draft-ietf-idnabis-tables-02 (work in progress), July 2008.

   [9]   United Nations Economic and Social Commission for Western Asia
         (UN-ESCWA), "Guidelines for an Arabic Domain Name System
         (ADNS)", Internet-Draft farah-adntf-adns-guidelines-03.txt,
         November 2007.

   [10]  Al-Zoman, A., "Supporting the Arabic Language in Domain Names",
         October 2003, <http://www.arabic-domains.org/docs/NIC-docs/
         SupportingArabicDomainNmaes.pdf>.

   [11]  Al-Zoman, A., "Arabic Top-Level Domains", July 2003.

         Paper presented in EGM on promotion of Digital Arabic Content,
         the United Nations, ESCWA, Beirut


El-Sherbiny, et al.       Expires April 24, 2009               [Page 10]

Internet-Draft         Arabic Character Guidelines          October 2008


   [12]  League of Arab States, "Report of the first meeting of AWG-ADN,
         Damascus", February 2005,
         <http://www.arabic-domains.org/ar/intrnational-entites.php>.

         This document is in Arabic.


Authors' Addresses

   Ayman El-Sherbiny
   Information and Communication Technology Division ESCWA
   UN-House
   P.O. Box 11-8575
   Beirut
   Lebanon

   Email: El-sherbiny@un.org


   Mansour Farah
   Information and Communication Technology Division ESCWA
   UN-House
   P.O. Box 11-8575
   Beirut
   Lebanon

   Email: farah14@un.org


   Ibaa Oueichek
   Syrian Telecom Establishment
   Damascus
   Syria

   Email: oueichek@scs-net.org


   Abdulaziz H. Al-Zoman, PhD
   SaudiNIC, General Directorate of Internet Services
   IT Sector, CITC
   King Abdulaziz City for Science and Technology
   PO Box 6086
   Riyadh  11442
   Saudi Arabia

   Email: azoman@citc.gov.sa





El-Sherbiny, et al.       Expires April 24, 2009               [Page 11]

Internet-Draft         Arabic Character Guidelines          October 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.











El-Sherbiny, et al.       Expires April 24, 2009               [Page 12]


Html markup produced by rfcmarkup 1.109, available from https://tools.ietf.org/tools/rfcmarkup/