[Docs] [txt|pdf] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 RFC 1867

INTERNET-DRAFT                                                 E. Nebel
Form-based File Upload in HTML                              L. Masinter
draft-ietf-html-fileupload-02.txt                     Xerox Corporation
Expires in 6 months                                      April 19, 1995


                Form-based File Upload in HTML

Status of this Memo

  This document is an Internet-Draft.  Internet-Drafts are working
  documents of the Internet Engineering Task Force (IETF), its areas,
  and its working groups.  Note that other groups may also distribute
  working documents as Internet-Drafts.

  Internet-Drafts are draft documents valid for a maximum of six months
  and may be updated, replaced, or obsoleted by other documents at any
  time.  It is inappropriate to use Internet-Drafts as reference
  material or to cite them other than as ``work in progress.''

  To learn the current status of any Internet-Draft, please check the
  ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
  Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
  munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
  ftp.isi.edu (US West Coast).


1. Abstract

  Currently, HTML forms allow the producer of the form to request
  information from the user reading the form.  These forms have proven
  useful in a wide variety of applications in which input from the
  user is necessary.  However, this capability is limited because HTML
  forms don't provide a way to ask the user to submit files of data.
  Service providers who need to get files from the user have had to
  implement custom user applications.  (Examples of these custom
  browsers have appeared on the www-talk mailing list.)  Since
  file-upload is a feature that will benefit many applications, this
  draft proposes an extension to HTML to allow information providers
  to express file upload requests uniformly, and a MIME compatible
  representation for file upload responses.  This draft also includes
  a description of a backward compatibility strategy that allows new
  servers to interact with the current HTML user agents.

  The proposal is independent of which version of HTML it becomes a
  part.

2.  HTML forms with file submission

  The current draft HTML specification defines eight possible values
  for the attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE,
  PASSWORD, RADIO, RESET, SUBMIT, TEXT.

  In addition, it defines the default ENCTYPE attribute of the FORM
  element using the POST METHOD to have the default value
  "application/x-www-form-urlencoded".

  This proposal makes three changes:
  1) add a FILE option for the TYPE attribute of INPUT
  2) Allow an ACCEPT attribute for INPUT tag, which is a list of
     media types or type patterns allowed for the input
  3) allow the ENCTYPE of a FORM to be "multipart/form-data".

  These changes might be considered independently, but are all
  necessary for reasonable file upload.

  The author of an HTML form who wants to request one or more files
  from a user would write (for example):

    <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST>

    File to process: <INPUT NAME="userfile1" TYPE="file">

    <INPUT TYPE="submit" VALUE="Send File">

    </FORM>

  The change to the HTML DTD is to add one item to the entity
  "InputType". In addition, it is proposed that the INPUT tag have an
  ACCEPT attribute, which is a list of comma-separated media types.


  ... (other elements) ...

  <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
                         RADIO | SUBMIT | RESET |
                         IMAGE | HIDDEN | FILE )">
  <!ELEMENT INPUT - 0 EMPTY>
  <!ATTLIST INPUT
          TYPE %InputType TEXT
          NAME CDATA #IMPLIED  -- required for all but submit and reset
          VALUE CDATA #IMPLIED
          SRC %URI #IMPLIED  -- for image inputs --
          CHECKED (CHECKED) #IMPLIED
          SIZE CDATA #IMPLIED  --like NUMBERS,
                                  but delimited with comma, not space
          MAXLENGTH NUMBER #IMPLIED
          ALIGN (top|middle|bottom) #IMPLIED
          ACCEPT CDATA #IMPLIED --list of content types
          >

  ... (other elements) ...

3.  Suggested implementation

  While user agents that interpret HTML have wide leeway to choose the
  most appropriate mechanism for their context, this section suggests
  how one class of user agent, WWW browsers, might implement file
  upload.

3.1 Display of FILE widget

  When a INPUT tag of type FILE is encountered, the browser might show
  a display of (previously selected) file names, and a "Browse" button
  or selection method. Selecting the "Browse" button would cause the
  browser to enter into a file selection mode appropriate for the
  platform. Window-based browsers might pop up a file selection
  window, for example. In such a file selection dialog, the user would
  have the option of replacing a current selection, adding a new file
  selection, etc. Browser implementors might choose let the list of
  file names be manually edited.

  If an ACCEPT attribute is present, the browser might constrain the
  file patterns prompted for to match those with the corresponding
  appropriate file extensions for the platform.

3.2 Action on submit

  When the user completes the form, and selects the SUBMIT element,
  the browser should send the form data and the content of the
  selected files.  The encoding type application/x-www-form-urlencoded
  is inefficient for sending large quantities of binary data.  Thus, a
  new media type, multipart/form-data, is proposed as a way of
  efficiently sending the values associated with a filled-out form
  from client to server.


3.3 use of multipart/form-data

  The definition of multipart/form-data is included in section 7.
  The media-type multipart/form-data follows the rules of all
  multipart MIME data streams as outlined in RFC 1521--a boundary is
  selected that does not occur in any of the data.  Each field of the
  form is sent, in the order in which it occurs in the form, as a part
  of the multipart stream.  Each part identifies the INPUT name within
  the original HTML form using a "content-disposition: form-data" header
  with a name attribute specifying the field name.  Each part has an
  optional Content-Type (which defaults to text/plain). File inputs
  should be identified as either application/octet-stream or the
  appropriate media type, if known. If multiple files were selected,
  they should be transferred together using the multipart/mixed
  format.

  The "content-transfer-encoding" header should be supplied for all
  fields whose values do not conform to the default 7BIT encoding.
  (All characters 7-bit US-ASCII data with lines no longer than 1000
  characters.) Otherwise, file data and longer field values may be
  transferred using a content-transfer-encoding appropriate to the
  protocol of the ACTION in the form. For HTTP applications,
  content-transfer-encoding of "binary" may be use.  If the ACTION is
  a "mailto:" URL, then the user agent may encode the data
  appropriately to the mail transport mechanism.  [See section 5 of
  RFC 1521 for more details.]

  File inputs may optionally identify the file name using the
  "filename" attribute on the content-disposition header.  This is not
  required, but is as a convenience for those cases where, for
  example, the uploaded files might contain references to each other,
  e.g., a TeX file and its .sty auxiliary style description.


  On the server end, the ACTION might point to a HTTP URL that
  implements the forms action via CGI. In such a case, the CGI program
  would note that the content-type is multipart/form-data, parse the
  various fields (checking for validity, writing the file data to local
  files for subsequent processing, etc.).

3.4 Interpretation of other attributes

  The VALUE attribute might be used with <INPUT TYPE=file> tags for
  a default file name. This use is probably platform dependent.
  It might be useful, however, in sequences of more than one
  transaction, e.g., to avoid having the user prompted for the same
  file name over and over again.

  The SIZE attribute might be specified using SIZE=width,height, where
  width is some default for file name width, while height is the
  expected size showing the list of selected files.  For example, this
  would be useful for forms designers who expect to get several files
  and who would like to show a multiline file input field in the
  browser (with a "browse" button beside it, hopefully).  It would be
  useful to show a one line text field when no height is specified
  (when the forms designer expects one file, only) and to show a
  multiline text area with scrollbars when the height is greater than
  1 (when the forms designer expects multiple files).

4.  Backward compatibility issues

  While not necessary for successful adoption of an enhancement to the
  current WWW form mechanism, it is useful to also plan for a
  migration strategy: users with older browsers can still participate
  in file upload dialogs, using a helper application. Most current web
  browers, when given <INPUT TYPE=FILE>, will treat it as <INPUT
  TYPE=TEXT> and give the user a text box. The user can type in a file
  name into this text box. In addition, current browsers seem to
  ignore the ENCTYPE parameter in the <FORM> element, and always
  transmit the data as application/x-www-form-urlencoded.

  Thus, the server CGI might be written in a way that would note that
  the form data returned had content-type
  application/x-www-form-urlencoded instead of
  multipart/form-data, and know that the user was using a browser
  that didn't implement file upload.

  In this case, rather than replying with a "text/html" response, the
  CGI on the server could instead send back a data stream that a helper
  application might process instead; this would be a data stream of
  type "application/x-please-send-files", which contains:

  * The (fully qualified) URL to which the actual form data should
    be posted (terminated with CRLF)
  * The list of field names that were supposed to be file contents
    (space separated, terminated with CRLF)
  * The entire original application/x-www-form-urlencoded form data
    as originally sent from client to server.

  In this case, the browser needs to be configured to process
  application/x-please-send-files to launch a helper application.

  The helper would read the form data, note which fields contained
  'local file names' that needed to be replaced with their data
  content, might itself prompt the user for changing or adding to the
  list of files available, and then repackage the data & file contents
  in multipart/form-data for retransmission back to the server.

  The helper would generate the kind of data that a 'new' browser should
  actually have sent in the first place, with the intention that the URL
  to which it is sent corresponds to the original ACTION URL. The point
  of this is that the server can use the *same* CGI to implement the
  mechanism for dealing with both old and new browsers.

  The helper need not display the form data, but *should* ensure that
  the user actually be prompted about the suitability of sending the
  files requested (this is to avoid a security problem with malicious
  servers that ask for files that weren't actually promised by the
  user.) It would be useful if the status of the transfer of the files
  involved could be displayed.

5.  Other considerations

5.1 Compression, encryption

  This scheme doesn't address the possible compression of files.
  After some consideration, it seemed that the optimization issues of
  file compression were too complex to try to automatically have
  browsers decide that files should be compressed.  Many link-layer
  transport mechanisms (e.g., high-speed modems) perform data
  compression over the link, and optimizing for compression at this
  layer might not be appropriate. It might be possible for browsers to
  optionally produce a content-transfer-encoding of x-compress for
  file data, and for servers to decompress the data before processing,
  if desired; this was left out of the proposal, however.

  Similarly, the proposal does not contain a mechanism for encryption
  of the data; this should be handled by whatever other mechanisms are
  in place for secure transmission of data, whether via secure HTTP or
  mail.

5.2 Deferred file transmission

  In some situations, it might be advisable to have the server
  validate various elements of the form data (user name, account,
  etc.)  before actually preparing to receive the data.  However,
  after some consideration, it seemed best to require that servers
  that wish to do this should implement this as a series of forms,
  where some of the data elements that were previously validated might
  be sent back to the client as 'hidden' fields, or by arranging the
  form so that the elements that need validation occur first.  This
  puts the onus of maintaining the state of a transaction only on
  those servers that wish to build a complex application, while
  allowing those cases that have simple input needs to be built
  simply. Clients are encouraged to supply content-length for overall
  file input so that a busy server could detect if the proposed file
  data is too large to be processed reasonably and just return an
  error code and close the connection without waiting to process all
  of the incoming data.

  If the INPUT tag includes the attribute MAXLENGTH, the user agent
  should consider its value to represent the maximum Content-Length
  (in bytes) which the server will accept for transferred files.  In
  this way, servers can hint to the client how much space they have
  available for a file upload, before that upload takes place.  It is
  important to note, however, that this is only a hint, and the actual
  requirements of the server may change between form creation and file
  submission.

5.3 Other choices for return transmission of binary data

  Various people have suggested using new mime top-level type
  "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of
  "packet" to express indeterminate-length binary data, rather than
  relying on the multipart-style boundaries.  While we are not opposed
  to doing so, this would require additional design and
  standardization work to get acceptance of "aggregate".  On the other
  hand, the 'multipart' mechanisms are well established, simple to
  implement on both the sending client and receiving server, and as
  efficient as other methods of dealing with multiple combinations of
  binary data.

5.4 Not overloading <INPUT>:

  Various people have wondered about the advisability of overloading
  'INPUT' for this function, rather than merely providing a different
  type of FORM element.  Among other considerations, the migration
  strategy which is allowed when using <INPUT> is important.  In
  addition, the <INPUT> field *is* already overloaded to contain most
  kinds of data input; rather than creating multiple kinds of <INPUT>
  tags, it seems most reasonable to enhance <INPUT>.  The 'type' of
  INPUT is not the content-type of what is returned, but rather the
  'widget-type'; i.e., it identifies the interaction style with the
  user.  The description here is carefully written to allow <INPUT
  TYPE=FILE> to work for text browsers or audio-markup.

5.5 Default content-type of field data

  Many input fields in HTML are to be typed in. There has been some
  ambiguity as to how form data should be transmitted back to servers.
  Making the content-type of <INPUT> fields be text/plain clearly
  disambiguates that the client should properly encode the data before
  sending it back to the server with CRLFs.

5.6 Allow form ACTION to be "mailto:"

  Independent of this proposal, it would be very useful for HTML
  interpreting user agents to allow a ACTION in a form to be a
  "mailto:" URL. This seems like a good idea, with or without this
  proposal. Similarly, the ACTION for a HTML form which is received
  via mail should probably default to the "reply-to:" of the message.
  These two proposals would allow HTML forms to be served via HTTP
  servers but sent back via mail, or, alternatively, allow HTML forms
  to be sent by mail, filled out by HTML-aware mail recipients, and
  the results mailed back.

5.7 Remote files with third-party transfer

  In some scenarios, the user operating the client software might want
  to specify a URL for remote data rather than a local file. In this
  case, is there a way to allow the browser to send to the client a
  pointer to the external data rather than the entire contents? This
  capability could be implemented, for example, by having the client
  send to the server data of type "message/external-body" with
  "access-type" set to, say, "uri", and the URL of the remote data in
  the body of the message.

5.8 File transfer with ENCTYPE=x-www-form-urlencoded

  If a form contains <INPUT TYPE=file> elements but does not contain
  an ENCTYPE in the enclosing <FORM>, the behavior is not specified.
  It is probably inappropriate to attempt to URN-encode large
  quantities of data to servers that don't expect it.

5.9 CRLF used as line separator

  As with all MIME transmissions, CRLF is used as the separator for
  lines in a POST of the data in multipart/www-form-data.

6. Examples

   Suppose the server supplies the following HTML:

     <FORM ACTION="http://server.dom/cgi/handle"
           ENCTYPE="multipart/form-data"
           METHOD=POST>
     What is your name? <INPUT TYPE=TEXT NAME=submitter>
     What files are you sending? <INPUT TYPE=FILE NAME=pics>
     </FORM>

   and the user types "Joe Blow" in the name field, and selects
   a text file "file1.txt" and also an image file "file2.gif" for
   the answer to 'What files are you sending?'.

   The client would send back the following data:

   Content-type: multipart/form-data, boundary=AaB03x
   --AaB03x
   content-disposition: form-data; name="field1"

   Joe Blow
   --AaB03x
   content-disposition: form-data; name="pics"
   Content-type: multipart/mixed, boundary=BbC04y

   --BbC04y
   Content-disposition: attachment; filename="file1.txt"
   Content-Type: text/plain
   Content-Transfer-Encoding: binary

   ... contents of file1.txt ...
   --BbC04y
   Content-disposition: attachment; filename="file2.gif"
   Content-type: image/gif
   Content-Transfer-Encoding: binary

   ...contents of file2.gif...
   --BbC04y--
   --AaB03x--


7. Registration of multipart/form-data

 The media-type multipart/form-data follows the rules of all
 multipart MIME data streams as outlined in RFC 1521. It is intended
 for use in returning the data that comes about from filling out a
 form. In a form (in HTML, although other applications may also use
 forms), there are a series of fields to be supplied by the user who
 fills out the form. Each field has a name. The name of the field
 is restricted to be a set of US-ASCII graphic characters; within a
 given form, the names are unique.

 multipart/form-data contains a series of parts. Each part is expected
 to contain a content-disposition header where the value is
 "form-data" and a name attribute specifies the field name within the
 form, e.g., 'content-disposition: form-data; name="xxxxx"', where
 xxxxx is the field name corresponding to that field.  As with all
 multipart MIME types, each part has an optional Content-Type which
 defaults to text/plain.

 Note that mime headers are generally required to consist only of
 7-bit data in the US-ASCII character set. This specification thus
 requires that the field names used consist of 7-bit ascii US
 characters.

 If the contents of a file are returned via filling out a form, then
 the file input is identified as application/octet-stream or the
 appropriate media type, if known.  If multiple files are to be
 returned as the result of a single form entry, they can be returned
 as multipart/mixed embedded within the multipart/form-data.

  The "content-transfer-encoding" header should be supplied for all
  fields whose values do not conform to the default 7BIT encoding
  (all characters 7-bit US-ASCII data with lines no longer than 1000
  characters.)

  Otherwise, file data and longer field values may be
  transferred using a content-transfer-encoding appropriate to the
  protocol of the ACTION in the form. For HTTP applications,
  content-transfer-encoding of "binary" may be use.  If the ACTION is
  a "mailto:" URL, then the user agent may encode the data
  appropriately to the mail transport mechanism.  [See section 5 of
  RFC 1521 for more details.]

  File inputs may also identify the file name. The file name may be
  described using the 'filename' parameter of the
  "content-disposition" header. This is not required, but is strongly
  recommended in any case where the original filename is known. This
  is useful or necessary in many applications.


8. Security Considerations

  It is important that a user agent not send any file that the user
  has not asked to be sent, explicitly. Thus, HTML interpreting agents
  are expected to confirm any default file names that might be
  suggested with <INPUT TYPE=file VALUE="yyyy">.  Never have any
  hidden fields be able to specify any file.

9.  Conclusion

  The suggested implementation gives the client a lot of flexibility in
  the number and types of files it can send to the server, it gives the
  server control of the decision to accept the files, and it gives
  servers a chance to interact with browsers which do not support INPUT
  TYPE "file".

  The change to the HTML DTD is very simple, but very powerful.  It
  enables a much greater variety of services to be implemented via the
  World-Wide Web than is currently possible due to the lack of a file
  submission facility.  This would be an extremely valuable addition to
  the capabilities of the World-Wide Web.


A.  Authors' Addresses

  Larry Masinter                          masinter@parc.xerox.com
  Xerox Palo Alto Research Center         Voice:  (415) 812-4365
  3333 Coyote Hill Road                   Fax:    (415) 812-4333
  Palo Alto, CA 94304

  Ernesto Nebel                           nebel@xsoft.sd.xerox.com
  XSoft, Xerox Corporation                Voice:  (619) 676-7817
  10875 Rancho Bernardo Road, Suite 200   Fax:    (619) 676-7865
  San Diego, CA 92127-2116


B. Media type registration for multipart/form-data
Media Type name:
 multipart

Media subtype name:
 form-data

Required parameters:
 none

Optional parameters:
 none

Encoding considerations:
 No additional considerations other than as for other multipart types.

Published specification:
 draft-ietf-html-fileupload-02.txt

Security Considerations

  The multipart/form-data type introduces no new security
  considerations beyond what might occur with any of the enclosed
  parts.

Person & email address to contact for further information:

   Larry Masinter
   masinter@parc.xerox.com


Html markup produced by rfcmarkup 1.129c, available from https://tools.ietf.org/tools/rfcmarkup/