[Docs] [txt|pdf] [Tracker] [Email] [Nits]

Versions: 00 02 03 04 RFC 5564

Network Working Group                           A. ElSherbiny,UN-ESCWA
Internet-Draft                                  M. Farah, UN-ESCWA
draft-farah-adntf-ling-guidelines-00.txt        A. Al Zoman,SaudiNIC
Category: Standards Track                       I. Oueichek, STE
Expires: August 2008                            February 2008

                       Linguistic Guidelines for
            the Use of Arabic Characters in Internet Domains

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements.  Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state and
status of this protocol.  Distribution of this memo is unlimited.

This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights

By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware have
been or will be disclosed, and any of which he or she becomes aware will
be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups.  Note that other
groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html

Copyright Notice

Copyright (C) The IETF TRUST (2008).

Abstract

This document constitutes technical specifications for the use of Arabic
characters in Internet Domain names and provides linguistic guidelines
for Arabic Domain Names.  It addresses Arabic-specific linguistic issues
pertaining to the use of Arabic language in domain names.



Farah, et al.                  Standards Track                  [Page 1]

Internet-Draft            Linguistic Guidelines            February 2008

Table of Contents

1. Introduction........................................................2
2. Arabic Language-Specific Issues.....................................2
   2.1 Linguistic issues...............................................3
   2.2 Supported Character Set.........................................4
   2.3 Arabic Linguistic Issues Affected By Technical Constraints......6
3. Security Considerations.............................................7
4. IANA Considerations.................................................7
5. Conclusion..........................................................7
6. Acknowledgments.....................................................7
Normative References...................................................7
Informative References.................................................7
Authors' Addresses.....................................................7
Full Copyright Statement...............................................8
Disclaimer of Validity.................................................9

1. Introduction

The Internet Engineering Task Force (IETF) issued in March 2003 a set
of RFCs for Internationalized Domain Names (IDN) [N1, N2, N3] are
supposed to become the de facto standard for all languages.  In 2007,
new versions of the internet-drafts proposing the revisions to the IDNA
protocol have been released and are as follows:

   - Klensin, J., "Internationalizing Domain Names for Applications
   (IDNA): Issues and Rationale", Work in Progress, November 2007.
   - Klensin, J., "Internationalizing Domain Names in Applications
   (IDNA): Protocol", Work in Progress, November 2007.
   - Alvestrand, H. and Karp, C., "An IDNA problem in right-to-left
   scripts", Work in Progress, July 2007.
   - Faltstrom, P., "The Unicode Codepoints and IDN", Work in Progress,
   November 2007.

This document constitutes a technical specification for the
implementation of the IDN standards in the case of the Arabic Language.
It will allow the use of standard language tables to write domain names
in Arabic characters.  Therefore, it should be considered as a logical
extension to the IDN standards.

This document reflects the recommendations of the Arab Working Group on
Arabic Domain Names (AWG-ADN) established by The League of Arab States
(LAS), based on standardisation efforts of the United Nations Economic
and Social Commission for Western Asia (UN-ESCWA) and its Internet-
Draft: Farah, et al. "Guidelines for an Arabic Internet Domain Name"
(draft-farah-adntf-adns-guidelines-03.txt).

The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" in
this document are to be interpreted as described in RFC 2119.

2. Arabic Language-Specific Issues

Farah, et al.                  Standards Track                  [Page 2]

Internet-Draft            Linguistic Guidelines            February 2008

The main objective of the creation of Arabic Domain Names is to have a
vehicle to increase Internet use amongst all strata of the Arabic-
speaking communities.

Furthermore, a non-user friendly Domain Name would further add to the
ambiguity and the eccentricity of the Internet to the Arabic-speaking
communities, thus contributing negatively to the spread of the Internet
and leading to further isolation of these communities at the global
level.

Hence, there have been intensive efforts especially those spearheaded by
Dr. Al-Zoman and contributed to by UN-ESCWA and its Arabic Domain Names
Task Force (ADN-TF) to reach consensus on a multitude of linguistic
issues with the following goals:

   - To define the accepted Arabic character set to be used for writing
   domain names in Arabic; which is the subject of this document.
   - To define the top-level domains of the Arabic domain name tree
   structure (i.e., Arabic gTLDs and ccTLDs).  This goal will be handled
   in a separate document.

The first meeting of the AWG-ADN, held in Damascus January-February
2005, gave special attention to the following:

   (a) Simplification of the domain names, whenever possible, to
   facilitate
   the interaction of the Arabic user with the Internet.
   (b) Adoption of solutions that do not lead to confusion either in
   reading or in writing, provided that this does not compromise the
   linguistic correctness of used words.
   (c)Mixing Arabic and non-Arabic letters in the domain name is not
   acceptable.

   2.1 Linguistic issues

   There are a number of linguistic issues that have been proposed with
   respect to the use of the Arabic language in domain names.  This
   section will highlight some of them.  This section is based on a the
   paper of Dr Al-Zoman [I1] and the report of the first meeting of
   AWG-ADN [N4].  For details the reader is encouraged review the
   references.

      2.1.1. Tashkeel (diacritics) and Shadda

      Both Tashkeel and Shadda must not be supported in the zone file,
      yet they can be supported only in the user interface, and stripped
      off at the preparation of internationalized strings (stringprep)
      phase.  The following are their Unicode presentations:

         U+064B ARABIC FATHATAN
         U+064C ARABIC DAMMATAN

Farah, et al.                  Standards Track                  [Page 3]

Internet-Draft            Linguistic Guidelines            February 2008

         U+064D ARABIC KASRATAN
         U+064E ARABIC FATHA
         U+064F ARABIC DAMMA
         U+0650 ARABIC KASRA
         U+0651 ARABIC SHADDA
         U+0652 ARABIC SUKUN

      2.1.2. Kasheeda or Tatweel (Horizontal Character Size Extension)

      Kasheeda (U+0640 ARABIC TATWEEL) must not be used in Arabic domain
      names.

      2.1.3. Character folding

      Character folding is the process where multiple letters (that may
      have some similarity with respect to their shapes) are folded into
      one shape.  This includes:

         - Folding Teh Marbuta (U+0629) and Heh (U+0647) at the end of a
         word;
         - Folding different forms of Hamzah (U+0622, U+0623, U+0625,
         U+0627);
         - Folding Alef Maksura (U+0649) and Yeh (U+064A) at the end of
         a word;
         - Folding Waw with Hamzah Above (U+0624) and Waw (U+0648).

      With respect to the Arabic language, character folding is not
      acceptable because it changes the meaning of words and it is
      against the principle of spelling rules.  Replacing a character
      with another character, which may have a similar shape, will give
      a different meaning.  This will lead to have only one word
      representing several words consisting of all the combinations of
      folded characters.  Hence, the other words will be masked by a
      single word [I1].

      Mis-spelling or handwriting errors do occur leading to mixing
      different characters despite the fact that this is not the case in
      published and printed materials.  One of the motivations of this
      effort is to preserve the language particularly with the spread of
      the globalization movement.  Within this context, character
      folding is working against this motivation since it is going to
      have a negative affect on the principle and ethics of the
      language.  Technology should work for preserving the language and
      not for destroying it.  Thus, character folding should not be
      allowed.

   2.2 Supported Character Set

   A domain name to be written in Arabic must be composed of a sequence
   of the following UNICODE characters.  These are based on UNICODE
   version 5.0.

Farah, et al.                  Standards Track                  [Page 4]

Internet-Draft            Linguistic Guidelines            February 2008

   TABLE 1: CHARACTERS FROM UNICODE ARABIC TABLE (0600-06FF)

   Unicode     Character Name
   0621     ARABIC LETTER HAMZA
   0622     ARABIC LETTER ALEF WITH MADDA ABOVE
   0623     ARABIC LETTER ALEF WITH HAMZA ABOVE
   0624     ARABIC LETTER WAW WITH HAMZA ABOVE
   0625     ARABIC LETTER ALEF WITH HAMZA BELOW
   0626     ARABIC LETTER YEH WITH HAMZA ABOVE
   0627     ARABIC LETTER ALEF
   0628     ARABIC LETTER BEH
   0629     ARABIC LETTER TEH MARBUTA
   062A     ARABIC LETTER TEH
   062B     ARABIC LETTER THEH
   062C     ARABIC LETTER JEEM
   062D     ARABIC LETTER HAH
   062E     ARABIC LETTER KHAH
   062F     ARABIC LETTER DAL
   0630     ARABIC LETTER THAL
   0631     ARABIC LETTER REH
   0632     ARABIC LETTER ZAIN
   0633     ARABIC LETTER SEEN
   0634     ARABIC LETTER SHEEN
   0635     ARABIC LETTER SAD
   0636     ARABIC LETTER DAD
   0637     ARABIC LETTER TAH
   0638     ARABIC LETTER ZAH
   0639     ARABIC LETTER AIN
   063A     ARABIC LETTER GHAIN
   0641     ARABIC LETTER FEH
   0642     ARABIC LETTER QAF
   0643     ARABIC LETTER KAF
   0644     ARABIC LETTER LAM
   0645     ARABIC LETTER MEEM
   0646     ARABIC LETTER NOON
   0647     ARABIC LETTER HEH
   0648     ARABIC LETTER WAW
   0649     ARABIC LETTER ALEF MAKSURA
   064A     ARABIC LETTER YEH
   0660     ARABIC-INDIC DIGIT ZERO
   0661     ARABIC-INDIC DIGIT ONE
   0662     ARABIC-INDIC DIGIT TWO
   0663     ARABIC-INDIC DIGIT THREE
   0664     ARABIC-INDIC DIGIT FOUR
   0665     ARABIC-INDIC DIGIT FIVE
   0666     ARABIC-INDIC DIGIT SIX
   0667     ARABIC-INDIC DIGIT SEVEN
   0668     ARABIC-INDIC DIGIT EIGHT
   0669     ARABIC-INDIC DIGIT NINE

   Source: A. Al-Zoman, "Supporting the Arabic Language in Domain

Farah, et al.                  Standards Track                  [Page 5]

Internet-Draft            Linguistic Guidelines            February 2008

   Names", October 2003

   TABLE 2:  CHARACTERS FROM UNICODE BASIC LATIN TABLE (0000-007F):

   Unicode     Digit Name
   0030     DIGIT ZERO
   0031     DIGIT ONE
   0032     DIGIT TWO
   0033     DIGIT THREE
   0034     DIGIT FOUR
   0035     DIGIT FIVE
   0036     DIGIT SIX
   0037     DIGIT SEVEN
   0038     DIGIT EIGHT
   0039     DIGIT NINE
   002D     HYPHEN-MINUS
   002E     FULL STOP (Dot)

   Source: A. Al-Zoman, "Supporting the Arabic Language in Domain
   Names", October 2003

   2.3 Arabic Linguistic Issues Affected By Technical Constraints

   In this section, technical aspects of some linguistic issues are
   discussed.

      2.3.1. Numerals

      In the Arab countries, there are two sets of numerical digits
      used:
         - Set I: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) mostly used in the
         western part of the Arab world.
         - Set II: (u+0660, u+0661, u+0662, u+0663, u+0664, u+0665,
         u+0666, u+0667, u+0668, u+0669) mostly used in the eastern part
         of the Arab world.

      Although visual differentiation between the Arabic zero (u+0660)
      and the dot (u+002E) in printed material is possible (the zero is
      larger in size and is printed higher than the dot), using it in
      domain names may lead to confusion.  Folding set II to set I will
      eliminate the problem of the zero, in specific, and that of
      numerals in general.

      Both sets may be supported in the user interface but both must be
      folded to one set (Set I) at the preparation of internationalized
      strings (e.g., "stringprep") phase; i.e. storage of numerals in
      the zone file is done in ASCII format.

      2.3.2. The Space Character

      The space character is strictly not allowed in domain names, as it

Farah, et al.                  Standards Track                  [Page 6]

Internet-Draft            Linguistic Guidelines            February 2008

     is a control character.  Instead, the hyphen (Al-sharta)
     (i.e.u+02D) is proposed as a separator between Arabic words:
     confusion can take place if the words are typed without a
     separator, unlike in ASCII.

      It is acceptable to use the hyphen to separate between words
      within the same domain name label.

3. Security Considerations

No particular security considerations could be identified regarding the
use of Arabic characters in writing domain names.  In particular, any
potential visual confusion between different character strings is
avoided using the guidelines proposed in this document.

4. IANA Considerations

This document has no action for IANA.

5. Conclusion

The proposed guidelines are in full accordance with the IETF IDN
standards and take into account Arabic language-specific issues within a
compromise between grammatical rules of the Arabic language and the ease
of use of the language on the Internet.

6. Acknowledgments

ESCWA ICT Division provided support and funding for the development of
this document with the objective of reaching a standard for a
comprehensive Arabic Domain Names.  Thanks are due to SaudiNIC for its
continuous efforts in supporting the development of Arabic Domain Names.

Normative References

[N1] Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing
Domain Names in Applications (IDNA)", RFC 3490, March 2003.
[N2] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for
Internationalized Domain Names (IDN)", RFC 3491, March 2003.
[N3] Costello, "Punnycode: A Bootstring encoding of Unicode for
Internationalized Domain Names in Applications (IDNA)", RFC 3492,
March 2003.
[N4] League of Arab States, report of the first meeting of AWG-ADN,
Damascus, February 2005. http://www.arabic-domains.org/ar/
intrnational-entites.php

Informative References

[I1] A. Al-Zoman, "Supporting the Arabic Language in Domain Names",
October 2003,http://www.arabic-domains.org/docs/NIC-docs/
SupportingArabicDomainNmaes.pdf

Farah, et al.                  Standards Track                  [Page 7]

Internet-Draft            Linguistic Guidelines            February 2008

[I2] A. Al-Zoman, "Arabic Top-Level Domains", paper presented in EGM on
promotion of Digital Arabic Content, the United Nations, ESCWA, Beirut,
June-2003.

Author's Addresses

Abdulaziz H. Al-Zoman, PhD
Director
SaudiNIC, General Directorate of Internet Services, IT Sector, CITC
Riyadh, Saudi Arabia

Email: azoman@citc.gov.sa

Ayman El-Sherbiny
Information and Communication Technology Division
ESCWA, UN-House
P.O. Box 11-8575
Beirut, Lebanon

Email: El-sherbiny@un.org

Mansour Farah
Information and Communication Technology Division
ESCWA, UN-House
P.O. Box 11-8575
Beirut, Lebanon

Email: farah14@un.org

Ibaa Oueichek
Syrian Telecom Establishment
Damascus, Syria,

Email: oueichek@scs-net.org

Comments are solicited and should be addressed to the working
group's mailing list at ESCWA-ICTD@un.org and/or the author(s).

Full Copyright Statement

Copyright (C) The IETF Trust (2008).

This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.

This document and the information contained herein are provided on
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY

Farah, et al.                  Standards Track                  [Page 8]

Internet-Draft            Linguistic Guidelines            February 2008

WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY
RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.

Disclaimer of validity

The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology
described in this document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights.  Information on the procedures with respect to rights
in RFC documents can be found in BCP 78 and BCP 79.

Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention
any copyrights, patents or patent applications, or other
proprietary rights that may cover technology that may be required
to implement this standard.  Please address the information to the
IETF at ietf-ipr@ietf.org.

This document expires on August 2008.























Farah, et al.                  Standards Track                  [Page 9]

Html markup produced by rfcmarkup 1.109, available from https://tools.ietf.org/tools/rfcmarkup/