Network Working Group                                          M. Kerwin
Internet-Draft                                                       QUT
Intended status: Standards Track                           July 29, 2013
Expires: January 30, 2014

The 'file' URI Scheme
draft-kerwin-file-scheme-06

Abstract

This document specifies the file Uniform Resource Identifier (URI)
scheme that was originally specified in [RFC1738].  The purpose of
this document is to keep the information about the scheme on
standards track, since [RFC1738] has been made obsolete, and to
promote interoperability by resolving disagreements between various
implementations.

This draft should be discussed on its github project page [github].

1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
1.1.  History . . . . . . . . . . . . . . . . . . . . . . . . .   3
1.2.  Conventions and Terminology . . . . . . . . . . . . . . .   4
2.  Scheme Definition . . . . . . . . . . . . . . . . . . . . . .   4
2.1.  Syntax  . . . . . . . . . . . . . . . . . . . . . . . . .   5
3.  Implementation Notes  . . . . . . . . . . . . . . . . . . . .   6
3.1.  Hierarchical Structure  . . . . . . . . . . . . . . . . .   6
3.2.  Relative file paths . . . . . . . . . . . . . . . . . . .   7
3.3.  Drives, drive letters, mount points, file system root . .   7
3.4.  UNC File Paths  . . . . . . . . . . . . . . . . . . . . .   8
3.5.  Namespaces  . . . . . . . . . . . . . . . . . . . . . . .   9
4.  Encoding and Character Set Considerations . . . . . . . . . .   9
5.  Security Considerations . . . . . . . . . . . . . . . . . . .  10
6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  10
6.1.  URI Scheme Name . . . . . . . . . . . . . . . . . . . . .  10
6.2.  Status  . . . . . . . . . . . . . . . . . . . . . . . . .  10
6.3.  URI Scheme Syntax . . . . . . . . . . . . . . . . . . . .  10
6.4.  URI Scheme Semantics  . . . . . . . . . . . . . . . . . .  11
6.5.  Encoding Considerations . . . . . . . . . . . . . . . . .  11
6.6.  Applications/Protocols That Use This URI Scheme Name  . .  11
6.7.  Interoperability Considerations . . . . . . . . . . . . .  12
6.8.  Security Considerations . . . . . . . . . . . . . . . . .  12
6.9.  Contact . . . . . . . . . . . . . . . . . . . . . . . . .  12
6.10. Author/Change Controller  . . . . . . . . . . . . . . . .  12
6.11. References  . . . . . . . . . . . . . . . . . . . . . . .  12
7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  13
8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  13
8.1.  Normative References  . . . . . . . . . . . . . . . . . .  13
8.2.  Informative References  . . . . . . . . . . . . . . . . .  14
Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  15

1.  Introduction

The 'file' URI scheme has historically had little or no
interoperability between platforms.  Further, implementers on a
single platform have often disagreed on the syntax to use for a
particular filesystem.  This document attempts to resolve those
problems, and define a standard scheme which is interoperable between
different extant and future implementations.  Additionally, it aims
to ease implementation by conforming to a general syntax that allows
existing URI parsing machinery to parse 'file' URIs.

URIs were previously defined in [RFC1738], which was updated by
[RFC3986].  Those documents also specify how to define schemes for
URIs.

The first definition for many URI schemes appeared in [RFC1738].
Because that document has been made obsolete, this document copies
the 'file' URI scheme from it to allow that material to remain on
standards track.

1.1.  History

This section is non-normative.

The 'file' URI scheme was first defined in [RFC1630], which, being an
informational RFC, does not specify an Internet standard.  The
definition was standardised in [RFC1738], and the scheme was
registered with the Internet Assigned Numbers Authority (IANA)
[IANA-URI-Schemes]; however that definition omitted certain language
included by former that clarified aspects such as:

o  the use of slashes to donate boundaries between directory levels
of a hierarchical file system; and

o  the requirement that client software convert the 'file' URI into a
file name in the local file name conventions.

The Internet draft [I-D.draft-hoffman-file-uri] was written in an
effort to keep the 'file' URI scheme on standards track when
[RFC1738] was made obsolete, but that draft expired in 2005.  It
enumerated concerns arising from the various, often conflicting
implementations of the scheme.  It serves as the basis of this
document.

The 'file' URI scheme defined in [RFC1738] is referenced three times
in the current URI Generic Syntax standard [RFC3986], despite the
former's obsoletion:

1.  RFC 3986, Section 1.1 uses "file:///etc/hosts" as an example for
identifying a resource in relation to the end-user's local
context.

2.  RFC 3968, Section 1.2.3 mentions the "file" scheme regarding
relative references.

3.  RFC 3968, Section 3.2.2 says that '...the "file" URI scheme is
defined so that no authority, an empty host, and "localhost" all
mean the end-user's machine...'.

Finally the WHATWG defines a living URL standard [WHATWG-URL], which
includes algorithms for interpreting file URIs (as URLs).

1.2.  Conventions and Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].

2.  Scheme Definition

The 'file' URI scheme is used to identify and retrieve files
accessible on a particular host computer, where a "file" is a named
resource which can be accessed through the computer's filesystem
interface.

The mechanism for retrieving a representation of a dereferenced
'file' URI is through the computer's filesystem interface; for
example using the POSIX "open", "read" and "close" functions [POSIX].

This scheme, unlike most other URI schemes, does not identify a
resource that is universally accessible over the Internet.

Note well that "file" refers to filesystem names from the perspective
of the user of a reference, rather than in relation to a globally-
defined naming authority, so care ought to be taken when distributing
'file' URIs to ensure that such references are actually intended to
be interpreted in relation to the user's filesystem interface.

Also note that 'file' and 'ftp' URIs are not the same, even when the
target of the 'ftp' URI is the local host.

2.1.  Syntax

The syntax of a 'file' URI conforms with the generic syntax presented
in [RFC3986], with the following components:

scheme name
The literal value "file"

authority
If present, either the fully qualified domain name of the
system on which the file is accessible; or one of the special
values "localhost" or the empty string ("").

If the authority component is omitted, or has either of the
special values "localhost" or the empty string (""), it is
interpreted as "the machine from which the URI is being
interpreted".

A host name, if supplied, is typically intended to inform a client
on a remote machine that it cannot access the file system, or
perhaps to use some other mechanism to access the file.  It does
not imply that the file ought to be accessible over a network
connection.

No retrieval mechanism for files stored on a remote machine is
defined by this specification.

path
The hierarchical directory path to the file, using the slash
("/") character to separate directories.  Implementations SHOULD
translate between the slash-separated URI syntax and the local
system's conventions for specifying file paths, where they differ.
(See: Section 3.1)

Some systems allow 'file' URIs to refer to directories.  In this
case, the path usually (but not always) includes a terminating
slash character, such as in:

file:///usr/local/bin/

The presence of a terminating slash character always indicates
that the 'file' URI refers to a directory, but the absence of a
slash does not necessarily indicate that it refers to a file.
Implementations ought to use other mechanisms to detect
directories, if and when such detection is required.

query
The query component contains non-hierarchical data that, along
with data in the path component, serves to identify a file.  For
example, in a versioning file system, the query component might be
used to refer to a specific version of a file.

Few implementations are known to use or support query components.

fragment
The semantics of a fragment component are undefined by this
specification.  A protocol that employs 'file' URIs MAY define its
own semantics for fragment components in the context of that
protocol.

Systems exhibit different levels of case-sensitivity.
Implementations SHOULD maintain the case of file and directory names
when translating 'file' URIs to and from the local system's
representation of file paths, and any systems or devices that
transport 'file' URIs SHOULD NOT alter the case of 'file' URIs they
transport.

See Section 6.3 for a formal syntax defintion.

Previous definitions of the 'file' URI scheme required two slashes
between the scheme and path, so implementations that wish to remain
interoperable with older implementations ought to include an
authority component in any 'file' URIs they generate.

3.  Implementation Notes

3.1.  Hierarchical Structure

Most implementations of the 'file' URI scheme do a reasonable job of
mapping the hierarchical part of a directory structure into the slash
("/") delimited hierarchy of the URI syntax, independent of the
native platform's delimiter.

For example, on Microsoft Windows platforms, it is typical that the
file system presents backslash ("\") as the file delimeter for file
names, yet the URI's forward slash ("/") can be used in 'file' URIs
interpreted on those platforms.  Similarly, on (some) Macintosh OS
versions, at least in some contexts, the colon (":") is used as the
delimiter in the native presentation of file path names.  Unix
systems natively use the same forward slash ("/") delimiter for
hierarchy, so there is a closer mapping between 'file' URI paths and
native path names.

In accordance with Section 3.3 of [RFC3986], the path segments "."
and "..", also known as dot-segments, are only interpreted within
the URI path hierarchy and are removed as part of the resolution
process ([RFC3986], Section 5.2).  Implementations operating on or
interacting with systems that allow dot-segments in their native
path representation may be required to escape those segments using
some other means when translating to and from 'file' URIs.

3.2.  Relative file paths

As relative references are resolved into their respective (absolute)
target URIs according to Section 5 of [RFC3986], this document
does not describe that resolution.  However, a fully resolved URI may
contain a non-absolute path.  For example, using a generic parser,
the URI:

file:a/b/c

would be parsed and interpreted as: file 'c', in directory 'b', in
directory 'a', on the machine on which the URI is being interpreted
(i.e. localhost); however there is no indication of the location of
the directory 'a' on that machine.  By convention an absolute file
path would begin with a slash ("/") character on a Unix-based system,
or a drive letter (e.g. "c:\") on a Microsoft Windows system, etc.

Resolution of relative file paths is undefined by this specification.
A protocol that employs 'file' URIs MAY define its own rules for
resolution of relative file paths in the context of that protocol.

3.3.  Drives, drive letters, mount points, file system root

Historically there has been considerable difference, in practice, for
handling of the syntax for the "top" of the hierarchy.  The 'file'
URI syntax provides one simple place for designating the root of the
file hierachy, and implementations have diverged, even on the same
platform, sometimes even within a single application.

For example, Microsoft DOS- and Windows-based systems support the
notion of a "drive letter", a single character which represents a
(virtual) drive, mount point, or device.  Native representations of
file paths start with the drive letter, a colon, and then the path;
e.g., "c:\TMP\test.txt".

Drive letters are mapped into the top of a 'file' URI in various
ways.  On systems running some versions of Microsoft Windows, the
drive letter may be specified with a colon (":") character, however
sometimes the colon is replaced with a pipe ("|") character, and in
some implementations the colon is omitted entirely.  The three

representations MAY be considered equivalent, and any implementation
which could interact with a Microsoft Windows environment SHOULD
interpret a single letter, optionally followed by a colon or pipe
character, in the first segment of the path as a drive letter.  For
example, the following URIs:

file:///c:/TMP/test.txt
file:///c|/TMP/test.txt
file:///c/TMP/test.txt

when interpreted on the same machine, would refer to the same file:

c:\TMP\test.txt

Implementations SHOULD use a colon (":") character to specify drive
letters when generating URIs for Windows- and DOS-based systems.

Note that some systems running some versions of Microsoft Windows are
known to omit the slash before the drive letter, effectively
replacing the authority component with the drive specification; for
example, "file://c:/TMP/test.txt".  In line with Postel's robustness
principle ("an implementation must be conservative in its sending
behavior, and liberal in its receiving behavior" [RFC791])
implementations that are likely to encounter such a URI MAY interpret
it as a drive letter, but SHOULD NOT generate such URIs.

3.4.  UNC File Paths

The Microsoft Windows Universal Naming Convention (UNC) [MS-DTYP]
defines a convention for specifying the location of resources such as
shared files or devices, for example Windows shares accessed via the
SMB/CIFS protocol [MS-SMB2].  The general structure of a UNC file
path, given in Augmented Backus-Naur Form (ABNF) [RFC5234], is:

UNC        = "\\" hostname "\" sharename *( "\" objectname )
hostname   = <NetBIOS name, FQDN, or IP address of a server>
sharename  = <name of a share or resource to be accessed>
objectname = <the name of an object>

Note that this syntax description is non-normative.

The canonical representation of a UNC file path as a 'file' URI
copies the UNC 'hostname' into the URI 'host' field, and the UNC
'sharename' and 'objectname's, concatenated with forward slash ("/")
characters, into the 'path'.  For example, the following UNC path:

\\server.example.com\Share\path\to\file.doc

is represented as a 'file' URI canonically as:

file://server.example.com/Share/path/to/file.doc
\________________/\_____________________/
hostname      sharename+objectnames

Historically some implementations have translated UNC file paths
entirely into the 'path' segment of a 'file' URI, including both
leading slashes, and the Firefox web browser even prefixes the UNC
file path with another slash.  For example, the UNC path given above
might be translated as:

file:////server.example.com/Share/path/to/file.doc
\_________________________________________/
translated UNC path

or even:

file://///server.example.com/Share/path/to/file.doc
\_________________________________________/
translated UNC path

An implementation receiving such a URI SHOULD convert it to the
canonical representation before processing it.

The 'file' URI scheme is unusual in that it does not specify an
Internet protocol or access method for shared files; as such, its
utility in network protocols between hosts is limited.  Examples of
file server protocols that do define such access methods include SMB/
CIFS [MS-SMB2], NFS [RFC3530], and NCP [NOVELL].

3.5.  Namespaces

The Microsoft Windows API defines Win32 Namespaces [Win32-Namespaces]
for interacting with files and devices using Windows API functions.
These namespaced paths are prefixed by "\\?\" for Win32 File
Namespaces and "\\.\" for Win32 Device Namespaces.  There is also a
special case for UNC file paths [MS-DTYP] in Win32 File Namespaces,
referred to as "Long UNC", using the prefix "\\?\UNC\".

This specification does not define a mechanism for translating
namespaced file paths into 'file' URIs.

4.  Encoding and Character Set Considerations

As specified in [RFC3986], the 'file' URI scheme allows any character
from the Universal Character Set (UCS, [ISO10646]) encoded as UTF-8
[RFC3629] and then percent-encoded in valid ASCII [RFC20].

If the local file system uses a known non-Unicode character encoding,
the file path SHOULD be converted to a sequence of Unicode characters
normalized according to Normalization Form C (NFC, [UTR15]).

Before applying any percent-encoding, an application MUST ensure the
following about the string that is used as input to the URI-
construction process:

o  The host, if any, consists only of Unicode code points that
conform to the rules specified in [RFC5892].

o  Internationalized domain name (IDN) labels are encoded as A-labels
[RFC5890].

5.  Security Considerations

There are many security considerations for URI schemes discussed in
[RFC3986].

File access and the granting of privileges for specific operations
are complex topics, and the use of 'file' URIs can complicate the
security model in effect for file privileges.  Under no circumstance
should software using 'file' URIs grant greater access than would be
available for other file access methods.

6.  IANA Considerations

In accordance with the guidelines and registration procedures for new
URI schemes [RFC4395], this section provides the information needed
to update the registration of the 'file' URI scheme.

6.1.  URI Scheme Name

file

6.2.  Status

permanent

6.3.  URI Scheme Syntax

The 'file' URI syntax is defined here in Augmented Backus-Naur Form
(ABNF) [RFC5234], including the core ABNF syntax rule 'ALPHA' defined
by that specifiation, and borrowing the 'host', 'path-absolute' and
'segment' rules from [RFC3986] (updated by [RFC6874]).

fileURI       = "file" ":" ( auth-file / local-file )

auth-file     = "//" ( host-file / no-host-file )
host-file     = hostpart path-absolute
;   file://<host>/<path>
;   file://localhost/<path>

no-host-file  = path-abs       ; begins with "/"
/ path-abs-win   ; begins with drive-letter
;   file:///<path>
;   file://c:/<path> *

local-file    = path-absolute  ;   file:/<path>
/ path-abs-win   ;   file:c:/<path>

hostpart      = "localhost" / host

path-abs      = 1*( "/" segment )

path-abs-win  = drive-letter path-absolute
drive-letter  = ALPHA [ drive-marker ]
drive-marker  = ":" / "|"

* The 'no-host-file' rule allows for dubious URIs that encode a
Windows drive letter as the authority component.  See Section 3.3 of
RFC XXXX.  [Note to RFC Editor: please replace XXXX with the number
issued to this document.]

6.4.  URI Scheme Semantics

See Section 2 of RFC XXXX.  [Note to RFC Editor: please replace XXXX
with the number issued to this document.]

6.5.  Encoding Considerations

See Section 4 of RFC XXXX.  [Note to RFC Editor: please replace XXXX
with the number issued to this document.]

6.6.  Applications/Protocols That Use This URI Scheme Name

Web browsers:

o  Firefox

Note: Firefox has a unique interpretation of RFC 1738 (See:
Bugzilla#107540 [1]), which affects UNC paths.

o  Chromium

o  Internet Explorer

o  Opera

Other applications/protocols:

o  Windows API [2,3]

o  Perl LWP

6.7.  Interoperability Considerations

Due to the convoluted history of the 'file' URI scheme there a many,
varied implementations in existence.  Many have converged over time,
forming a few kernels of closely-related functionality, and RFC XXXX
attempts to accommodate such common functionality.  [Note to RFC
Editor: please replace XXXX with the number issued to this document.]
However there will always be exceptions, and this fact is recognised.

6.8.  Security Considerations

See Section 4 of RFC XXXX [Note to RFC Editor: please replace XXXX
with the number issued to this document.]

6.9.  Contact

Matthew Kerwin, matthew@kerwin.net.au

6.10.  Author/Change Controller

This scheme is registered under the IETF tree.  As such, the IETF
maintains change control.

6.11.  References

[1]  "Bug 107540", Bugzilla@Mozilla, October 2007,
<https://bugzilla.mozilla.org/show_bug.cgi?id=107540>.

[2]  "PathCreateFromUrl function", MSDN, June 2013,
<http://msdn.microsoft.com/en-us/library/windows/desktop/
bb773581(v=vs.85).aspx>.

[3]  "UrlCreateFromPath function", MSDN, June 2013,
<http://msdn.microsoft.com/en-us/library/windows/desktop/
bb773773(v=vs.85).aspx>

7.  Acknowledgements

This specification is derived from RFC 1738 [RFC1738], RFC 3986
[RFC3986], and I-D draft-hoffman-file-uri (expired)
[I-D.draft-hoffman-file-uri]; the acknowledgements in those documents
still apply.

Matthew Kerwin
Queensland University of Technology
Victoria Park Rd
Kelvin Grove, QLD  4059
Australia

EMail: matthew.kerwin@qut.edu.au

