[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 03

INTERNET DRAFT                                       Davide Musella
draft-musella-html-metatag-03.txt               Institute for Multimedia
                                                      Technologies
                                                National Research Council

24 March 1997
Expires in six months

                       The META Tag of HTML


Status of this Memo

This document is an Internet-Draft.  Internet-Drafts are working documents
of the Internet Engineering Task Force (IETF), its areas, and its working
groups.  Note that other groups may also distribute working documents as
Internet-Drafts.  Internet-Drafts are draft documents valid for a maximum
of six months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet- Drafts as
reference material or to cite them other than as ``work in pro gress.''

To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Cost) or
ftp.isi.edu (US West Coast).

Distribution of this document is unlimited. Please send comments to:

Davide Musella
(e-Mail) davide@itim.mi.cnr.it
(voice) +39.(0)2.70643271
(fax) +39.(0)2.70643292


Abstract

This document defines a strict synopsis to catalogue an HTML document
using the META tag of HTML.  The given definition wants to define a base
subset of cataloguing keys to provide a preliminary classification method.



1 - Introduction

Now the synopsis of the META HTTP-EQUIV Tag is not severe, allowing so the
use of different key words to define the same thing.  The functions like
this:

<META   HTTP-EQUIV = "authors"
        CONTENT = "Pennac, Benni">

or

<META   HTTP-EQUIV = "writers"
        CONTENT = "Pennac, Benni">

could represent the same concepts with two different syntax.  The aim of
this Draft is to define the words which define the content of an HTML
document, without excluding a more specific classification realized with
different techniques.  The method used to accomplish this has been defined
at the "Distributed Indexing/Searching Workshop"
[http://www.w3.org/pub/www/Searching/9605- Indexing-Workshop/index.html]
and foresees to use a defined prefix to indicate which is the cataloguing
method used to describe a classification key.

2 - The META Tag

The META element is used within the HEAD element to embed documents meta-
information not defined by other HTML elements. Such information can be
extracted by servers/clients for use in identifying, indexing and
cataloguing specialized document meta-information.  It is generally
preferable to use named elements that have well defined semantics for each
type of meta-information. The Meta element is provided for situations here
strict SGML parsing is necessary and the local DTD is not extensible.  In
addition, HTTP servers can read the content of the document head to
generate response headers corresponding to any element defining a value
for the attribute HTTP-EQUIV.  This provides document authors with a
mechanism (not necessarily the preferred o ne) for identifying information
that should be included in the response headers of an HTTP request.

The META element has three attributes:

  NAME
  HTTP-EQUIV
  CONTENT

It's possible to use the META tag everywhere in the HEAD part. Mor eMETA
tags referring to the same string must be considered tied, combining
contents (concatenated as a comma-separated list).

3 - NAME

This attribute can be used to define some properties such as "number of
pages" or "preferred browser" or any information an author wants to insert
in his document. An example:

<META   NAME = "Maybe Published By"
        CONTENT = "McDraw Bill">

or

<META   NAME = "keywords"
        CONTENT = "manual, scouting">

Do not use the META element to define information that should be
associated with an existing HTML element.


4 - HTTP-EQUIV

This attribute binds the element to an HTTP response header. If the
semantics of the HTTP response header named by this attribute is known,
then the contents can be processed based on a well defined syntactic
mapping, whether or not the DTD includes anyth ing about it.  An HTTP
server must process these tags for a HEAD HTTP request. Do not name an
HTTP-EQUIV attribute the same as a response header that should typically
only be generated by the HTTP server. Some inappropriate names are
"Server", "Date", and "Last-Modified".  Wether a name is inappropriate
depends on the particular ser ver implementation. It is recommended that
servers ignore any META element that specifies HTTP equivalents (case
insensitively) to their own reserved response headers.  The HTTP-EQUIV
attribute has the same semantic value as the NAME attribute with the only
exception of the HTTP repercussions.

5 - CONTENT

Used to supply a value for a named property. It can contain more than one
single information.

6 - Cataloguing an HTML document

To classify an HTML document it's possible to use the META tag; using this
method the author can control how his document is indexed.  The intention
is to define a base set of meta information "normal_user oriented". The
idea is that most of the authors of HTML documents have no specialist
background: they are not librarian nor Internet specialists so their
knowledge of the cataloguing problems is really low. A normal behavior of
an Internet-user is avoiding the use of what he does not know, therefore,
to improve the use of the meta information, I have defined the following
keys to do a first rough catalogue of a HTML document:

Author: to indicate the author/s of the document,
        Ex:
        <META   HTTP-EQUIV = "Author"
                CONTENT = "Plutarco">
        To differentiate the name from the surname it is required to
        separate them with an underscore character "_" (ASCII [95]), using first
        the name/s and then the surname; so an example could be:
        <META   HTTP-EQUIV = "Author"
                CONTENT = "Milan_Kundera, Georg Wilhelm Friederich_Hegel,
                Leonardo_Da Vinci">

Description:  used to indicate the description of the document contents.
        It must be rationally shorter than the whole document.
        Ex:
        <META   HTTP-EQUIV ="Description"
                CONTENT ="This is the xxxxxx's home page. Here you'll find
                a lot of photos of my last holiday and a really big FAQ
                archive"

Expire: to indicate the expire date of the document (HTTP date format or
        "none" to indicate a document which content doesn't expire).
        Ex:
        <META   HTTP-EQUIV ="Expire"
                CONTENT ="13 Apr 1997 00:00 GMT">

Keywords: to indicate the keywords of the document. It's a sequence of
        comma separated phrases.
        To represent this concept with a boolean logic, we can say that the AND
        operator will be represented by the SPACE (ASCII[32]) and the OR
        operator by the COMMA (ASCII[44]). The AND operator is processed
        before the OR operator. So a string like this: "Red ball, White
        pen" means :"(Red AND ball) OR (White AND pen)".
        Ex:
        <META   HTTP-EQUIV = "Keywords"
                CONTENT = "Italian Products, Italian Tourism, Italy">
        The spaces between a comma and a word or vice versa are ignored.

Language: its content specifies the language in which the document is
        written: it is composed by two or three language-code letters, based on
        ISO-639 or ISO639/2 respectively, optionally followed both by a dash
        (ASCII[45]) and a ISO-3166 two country -code letters to represent the
        national variants.
        Ex:
        <META   HTTP-EQUIV = "Language"
                CONTENT  ="it">

Publisher: to indicate the organization responsible of the document
        publishing in the actual form.
        Ex:
        <META   HTTP-EQUIV ="Publisher"
                CONTENT ="Mc Draw-Bill">

Timestamp: to indicate when the document is authored  (HTTP date format).
        Ex:
        <META   HTTP-EQUIV ="Timestamp"
                CONTENT ="25 Mar 1997 08:30 GMT">.

The TITLE information (concerning the title of the document) is considered
given by the TITLE tag content to avoid useless redundancies.  It's highly
recommended to use the HTTP-EQUIV properties instead of the NAME so to
give the possibility to an agent to have these meta information without
requiring the full document.  A more complex description of the text
content could be added, without erasing these meta information, using more
specific techniques, like the Dublin Core or the MCF.


Appendix 1

HTTP date format

The HTTP date format is defined as:

HTTP-date    = rfc1123-date | rfc850-date | asctime-date

where
          rfc1123-date = wkday "," SP date1 SP time SP "GMT"
          rfc850-date  = weekday "," SP date2 SP time SP "GMT"
          asctime-date = wkday SP date3 SP time SP 4DIGIT

but the RFC850 format and the asctime format are obsolete (they are used
for backward compatibility), so it is highly recommended to use the
rfc1123 format:


rfc1123-date = [wkday "," SP ] date SP time

date1 = 1*2DIGIT SP month SP 4DIGIT (day month year)
        Ex: 25 Feb 1997

time =  hour zone

hour =  2DIGIT ":" 2DIGIT [":" 2DIGIT] (hours:minutes[:seconds])
        Ex: 22:55:30

wkday = "Mon" | "Tue" | "Wed"
              | "Thu" | "Fri" | "Sat" | "Sun"

month = "Jan" | "Feb" | "Mar" | "Apr"
              | "May" | "Jun" | "Jul" | "Aug"
                      | "Sep" | "Oct" | "Nov" | "Dec"

zone =  "UT"  | "GMT"                         ; Universal Time
                                              ; North American : UT
              |  "EST" | "EDT"                ;  Eastern:  - 5 | - 4
              |  "CST" | "CDT"                ;  Central:  - 6 | - 5
              |  "MST" | "MDT"                ;  Mountain: - 7 | - 6
              |  "PST" | "PDT"                ;  Pacific:  - 8 | - 7
              |  1ALPHA                       ; Military: Z = UT;
              | ( ("+" | "-") 4DIGIT )        ; Local differential
                                              ;  hours+min. (HHMM)

rfc1123-date examples:

28 Apr 1997 19:30 GMT
Mon, 28 Apr 1997 19:30:00 GMT
28 Apr 1997 20:30 +0100


Html markup produced by rfcmarkup 1.129b, available from https://tools.ietf.org/tools/rfcmarkup/