[Docs] [txt|pdf|xml|html] [Tracker] [Email] [Nits]

Versions: 00 01 02 03 04 05 06 07 08 draft-iab-rfc-use-of-pdf

Network Working Group                                     T. Hansen, Ed.
Internet-Draft                                         AT&T Laboratories
Intended status: Informational                               L. Masinter
Expires: December 28, 2014                                      M. Hardy
                                                           Adobe Systems
                                                           June 26, 2014


              PDF for an RFC Series Output Document Format
                     draft-hansen-rfc-use-of-pdf-00

Abstract
<1>
   This document discusses options and requirements for the PDF
   rendering of RFCs in the RFC Series, as outlined in RFC 6989.  It
   also discusses the use of PDF for Internet Drafts, and available or
   needed software tools for producing and working with PDF.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 28, 2014.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of




Hansen, et al.          Expires December 28, 2014               [Page 1]


Internet-Draft                PDF for RFCs                     June 2014


   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  History and current use of PDF with RFCs and Internet Drafts    3
     2.1.  RFCs  . . . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  Internet Drafts . . . . . . . . . . . . . . . . . . . . .   3
   3.  Options and Requirements for PDF RFCs . . . . . . . . . . . .   3
     3.1.  "Visible" requirements  . . . . . . . . . . . . . . . . .   4
       3.1.1.  General visible requirements  . . . . . . . . . . . .   4
       3.1.2.  Page size, margins, headings  . . . . . . . . . . . .   4
       3.1.3.  Similarity to other outputs . . . . . . . . . . . . .   4
       3.1.4.  Typeface choices  . . . . . . . . . . . . . . . . . .   5
       3.1.5.  Hyperlinks  . . . . . . . . . . . . . . . . . . . . .   5
     3.2.  "Invisible" options and requirements  . . . . . . . . . .   5
       3.2.1.  Internal Text Representation  . . . . . . . . . . . .   6
       3.2.2.  Unicode Support . . . . . . . . . . . . . . . . . . .   7
       3.2.3.  Metadata Support  . . . . . . . . . . . . . . . . . .   7
       3.2.4.  Document Structure Support  . . . . . . . . . . . . .   7
       3.2.5.  Tagged PDF  . . . . . . . . . . . . . . . . . . . . .   8
       3.2.6.  Embedded Files  . . . . . . . . . . . . . . . . . . .   8
       3.2.7.  Document Signatures . . . . . . . . . . . . . . . . .   8
   4.  Tooling . . . . . . . . . . . . . . . . . . . . . . . . . . .   9
   5.  Picking a PDF Profile . . . . . . . . . . . . . . . . . . . .   9
   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     6.1.  Informative References  . . . . . . . . . . . . . . . . .   9
     6.2.  URIs  . . . . . . . . . . . . . . . . . . . . . . . . . .   9
   Appendix A.  A Synopsis of PDF Format History . . . . . . . . . .  10
     A.1.  PDF/A . . . . . . . . . . . . . . . . . . . . . . . . . .  11
     A.2.  PDF/UA  . . . . . . . . . . . . . . . . . . . . . . . . .  11
     A.3.  Additional Reading  . . . . . . . . . . . . . . . . . . .  11
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction
<2>
   The RFC Series is evolving, as outlined in [RFC6949].  Future
   documents will use an archival format of XML with renderings in
   various formats, including PDF.
<3>
   PDF has a wide range of capabilities and alternatives; not all PDFs
   are "equal".  (See Appendix A for a brief history of PDF and its
   options.)  For example, visually similar documents could be scanned
   or rasterized images, include text layout options, hyperlinks,
   embedded fonts, digital signatures.  This document explains the
   options and also makes recommendations for choices, both for the RFC
   series and also Internet Drafts.



Hansen, et al.          Expires December 28, 2014               [Page 2]


Internet-Draft                PDF for RFCs                     June 2014


<4>
   The PDF format and the tools to manipulate it are not as well known
   as those for other formats.  This document discusses some of the
   processes for creating and using PDFs and both open source and
   commercial products.
<5>
   NOTE: see [1] for XML source, related files, and an issue tracker.

2.  History and current use of PDF with RFCs and Internet Drafts
<6>
   NOTE: this section is meant as an overview to give some background.

2.1.  RFCs
<7>
   The RFC series has for a long time accepted Postscript renderings of
   RFCs, either in addition to or instead of the text renderings of
   those same RFCs.  These have usually been produced when there was a
   complicated figure or mathematics within the document.  For example,
   consider the figures and mathematics found in RFCs 1119 and RFC 1142,
   and compare the figures found in the text version of RFC 3550 with
   those in the Postscript version.  The RFC editor has provided a PDF
   rendering of RFCs.  Usually, this has been a print of the text file
   that does not take advantage of any of the broader PDF functionality,
   unless there was a Postscript version of the RFC, which would then be
   used by the RFC editor to generate the PDF.

2.2.  Internet Drafts
<8>
   In addition to PDFs generated and published by the RFC editor, the
   IETF tools community has also long supported PDF for Internet Drafts.
   Most RFCs start with Internet Drafts, edited by individual authors.
   The Internet drafts submission tool at https://datatracker.ietf.org/
   submit/ accepts PDF and Postscript files in addition to the
   (required) text submission and (currently optional) XML.  If a PDF
   wasn't submitted for a particular version of an Internet Draft, the
   tools would generate one from the Postscript, HTML, or text.

3.  Options and Requirements for PDF RFCs
<9>
   This section lays out options and requirements for PDFs produced by
   the RFC editor for RFCs.  There are two sections: "Visible" options
   are related to how the PDF appears when it is viewed with a PDF
   viewer.  "Internal Structure" options affect the ability to process
   PDFs in other ways, but do not change the way the document looks.
<10>
   In many cases, the choice of PDF requirements is heavily influenced
   by the utility of available tools to create PDFs.  Most of the
   discussion of tooling is to be found in Section 4.



Hansen, et al.          Expires December 28, 2014               [Page 3]


Internet-Draft                PDF for RFCs                     June 2014


<11>
   NOTE: each option in this section should outline the nature of the
   design choice, outline the pros and cons, and make a recommendation.

3.1.  "Visible" requirements
<12>
   PDF supports rich visible layout of fixed-sized pages.

3.1.1.  General visible requirements
<13>
   For a consistent 'look' of RFC and good style, the PDFs produced by
   the RFC editor should have a clear, easy-to-read style.  They should
   print well on the widest range of printers, and look good on displays
   of varying resolution.

3.1.2.  Page size, margins, headings
<14>
   PDF files are laid out for a particular size of page, margins, and
   any headers and footers part of the layout.  Recommendations or ideas
   for further study:
<15>
   Page size  US Letter page size, but margins chosen so it will print
      and look good on A4 paper.
<16>
   Margins  The smallest margin consistent with above requirements.
<17>
   Headings  The same information from the text version of the document,
      but set in a smaller font in a lighter color.

3.1.3.  Similarity to other outputs
<18>
   There is some advantage to having the PDF files look like the text or
   HTML renderings of the same document.  There are several options even
   so.  The PDF
<19>
   1.  could look like the text version of the document, or
<20>
   2.  could look like the text version of the document but with
       pictures rendered as pictures instead of using their ASCII-art
       equivalent, or
<21>
   3.  could look like the HTML version.
<22>
   (Note that numbers 1 and 2 are what are currently produced by the RFC
   Editor on their web site.)
<23>
   Recommendation: the PDF rendition should look like the HTML
   rendition, at least in spirit -- for example, visually searching or



Hansen, et al.          Expires December 28, 2014               [Page 4]


Internet-Draft                PDF for RFCs                     June 2014


   scanning should be facilitated.  The typeface and size for printing
   should be chosen.

3.1.4.  Typeface choices
<24>
   A PDF may refer to a font by name, or it may use an embedded font.
   When a font is not embedded, a PDF viewer will attempt to locate a
   locally installed font of the same name.  If it can not find an exact
   match, it will find a "close match".  If a close match is not
   available, it will fallback to something.  This is highly
   implementation dependent.
<25>
   Recommendation: for consistent viewing, all fonts should be embedded.
<26>
   In addition, since the HTML version of the document is being visually
   replicated, the font(s) chosen should have both variable width and
   constant width components, as well as bold and italic
   representations.
<27>
   Few fonts have glyphs for the entire repertoire of Unicode
   characters; for this purpose, the PDF generation tool may need a set
   of fonts and a way of choosing them.
<28>
   Recommendation: ... TBD ...
<29>
   For readability, the main body text should be in a serif font and the
   headings in a sans-serif font.
<30>
   Code, BNF, and other text could use a fixed-width font to aid in
   insuring alignment, e.g., in BNF.

3.1.5.  Hyperlinks
<31>
   PDF supports hyperlinks both to sections of the same document and to
   other documents.
<32>
   Recommendation: All hyperlinks available in the HTML rendition of the
   RFC should also be visible and active in the PDF produced.

3.2.  "Invisible" options and requirements
<33>
   There are many things going on under the cover:
<34>
   o  What is usable in an RFC rendered as PDF?
<35>
   o  Where can we improve on the past?
<36>
   o  Where must we improve?



Hansen, et al.          Expires December 28, 2014               [Page 5]


Internet-Draft                PDF for RFCs                     June 2014


<37>
   o  What must be supported in PDF?
<38>
   These all turn into requirements for the conversion tools that are
   used to generate the PDF rendering of the visible representation.
<39>
   The areas we'll discuss are oriented around:
<40>
   1.  Text Structure
<41>
   2.  Unicode support
<42>
   3.  Metadata
<43>
   4.  Document Structure
<44>
   5.  Tagged PDF
<45>
   6.  Extractable Sections
<46>
   7.  Document Signatures

3.2.1.  Internal Text Representation
<47>
   The contents of a PDF file can be represented in many ways.  The PDF
   file could be generated:
<48>
   o  as an image of the visual representation, such as a picture (e.g.,
      a GIF) of the word 'IETF'
<49>
   o  placing individual characters in position on the page, such as
      saying "put an 'F' here", then "put an 'T' before it", then "put
      an 'E' before that", then "put an 'I' before that" to render the
      word 'IETF'
<50>
   o  placing words in position on the page, such as keeping the word
      'IETF' would be kept together, and
<51>
   o  using higher-level constructs for sets of words, such as keeping
      the sentence 'The Internet Engineering Task Force (IETF) supports
      the Internet.' together as a sentence.
<52>
   All of these end up with the same visual representation of the
   output.  However, each level has trade offs for auxiliary usage of
   the format.  For example, the higher-level construct would allow you
   to search for the word "IETF" or phrases including the word "IETF"",
   whereas using word placement would only allow you to search for
   "IETF", and the other representations would not easily support search



Hansen, et al.          Expires December 28, 2014               [Page 6]


Internet-Draft                PDF for RFCs                     June 2014


   at all.  As another example, when using an annotation tool to review
   a PDF file, it is harder to place a comment on a location within an
   image compared to attaching a comment to a given word.  Attaching a
   comment to a set of words (such as a bracketed set of words) is
   easier when using higher-level constructs.  Another example where
   higher-level constructs are needed are for accessibility purposes:
   text to speech needs the sentences to be presented as a whole and in
   the proper order.
<53>
   Requirement: ... TBD ...

3.2.2.  Unicode Support
<54>
   Unicode is being fully supported, so the RFC PDF format must
   similarly have full support for Unicode.  While Unicode is not
   required by PDF, certain PDF profiles require its support.
<55>
   Requirement: PDF files generated must have the full text, exactly as
   it appears in the original XML for text, or within SVG for images.
   (NOTE: What about text in image illustrations?)

3.2.3.  Metadata Support
<56>
   Metadata encodes information about the document authors, the document
   series, date created, etc. using the RDF Dublin core (and other
   elements).  Having this metadata within the PDF file allows it to be
   extracted by the rendering tools.  It can also include additional
   information such as pointers to where the document can be found on
   the RFC Editor web site.
<57>
   PDF supports embedded metadata using XMP.  NOTE: Need a reference and
   explanation.
<58>
   Recommendation: The PDFs generated should have all of the metadata
   from the XML version embedded directly as XMP metadata, including the
   author and date information, set the document series, and a URL for
   where the document can be retrieved.

3.2.4.  Document Structure Support
<59>
   The section structure of an RFC can be mapped into the PDF elements
   for the document structure.  This will allow the bookmark feature of
   PDF readers to be used to quickly access sections of the document.
<60>
   Requirement: The section structure of an RFC must be mapped into the
   PDF elements for the document structure.  This would include section
   headings for the boilerplate sections such as the Abstract, Status of
   the Document, Table of Contents, and Author Addresses.



Hansen, et al.          Expires December 28, 2014               [Page 7]


Internet-Draft                PDF for RFCs                     June 2014


3.2.5.  Tagged PDF
<61>
   ... say more about the use of alternative texts for images, tagging
   text spans and giving them an ID, and providing replacement texts for
   symbols and images
   ...
   hyperlinks within the document, hyperlinks to external locations,
   ...
   Where should hyperlinks to RFCs point? to the info page for the RFC?
   to the PDF version of the RFC?  (NOTE: the RFC Series Editor has
   stated a preference for them to point to the info page for the RFC.)
   ...
   Similar questions need to be answered on references to internet
   drafts: Where should hyperlinks to internet drafts point?  To the
   datatracker entry?  To the tools entry?  To a PDF version of the
   internet draft?
   ...
   a role-map should be provided here to map the logical tags found in
   the RFC XML to the standard tagset for PDF.  This would be included
   in the generated PDF.

3.2.6.  Embedded Files

3.2.6.1.  Extractable Code Segments and Artwork
<62>
   It has been suggested that the source input for code segments (e.g.,
   ABNF, C code, MIBs) be extractable from the PDF.  This capability
   might be supported through other mechanisms from the XML source
   files, but could also be supported within the PDF.  PDF/A-2 (based on
   PDF release 1.7) allows for the embedding of some file formats, while
   PDF/A-3 adds support for arbitrary files to be embedded.

3.2.6.2.  Extractable XML source
<63>
   Another suggestion that has been made is that the XML input file
   itself could be embedded within the PDF.  This would make the PDF
   file totally self-referential.

3.2.7.  Document Signatures
<64>
   PDF has supported file signatures since PDF 1.2.  It has been
   suggested that the PDF files be signed by the RFC Editor on creation.
   This would allow the signatures to be authenticated.
<65>
   Recommendation: The RFC PDF documents created by the RFC Editor
   should be digitally signed.
<66>
   Recommendation: Internet drafts do not need to be digitally signed.



Hansen, et al.          Expires December 28, 2014               [Page 8]


Internet-Draft                PDF for RFCs                     June 2014


4.  Tooling
<67>
   NOTE: This section will talk about tools for creating, manipulating,
   transforming PDF files, including those currently in use by the RFC
   editor and Internet drafts, as well as outlining available PDF tools
   for various processes.
<68>
   during either phase: generation.
<69>
   during I-D phase: xml2rfc, of course for authors.  Copy from PDF
   files?  Review and comment.  Digital Signature tools.  Comparing two
   PDF files (versions).
<70>
   During RFC publishing, xml2rdf.  Editing of PDF to correct layout
   errors.  Nits checking, checking conformance with PDF/a and PDF/ua.

5.  Picking a PDF Profile
<71>
   There are profiles of PDF for specific purposes: PDF/UA and PDFA3
   etc.
<72>
   NOTE: add reasoning here about the recommendations.
<73>
   Recommendation: use PDF/UA and also PDF/A3.

6.  References

6.1.  Informative References

   [RFC3778]  Taft, E., Pravetz, J., Zilles, S., and L. Masinter, "The
              application/pdf Media Type", RFC 3778, May 2004.

   [RFC6949]  Flanagan, H. and N. Brownlee, "RFC Series Format
              Requirements and Future Development", RFC 6949, May 2013.

6.2.  URIs

   [1] https://github.com/masinter/pdfrfc

   [2] http://www.adobe.com/devnet/pdf/pdf_reference_archive.html

   [3] http://en.wikipedia.org/wiki/PDF

   [4] http://www.pdflib.com/fileadmin/pdflib/pdf/whitepaper/Whitepaper-
       Technical-Introduction-to-PDFA.pdf

   [5] http://www.pdfa.org/wp-content/uploads/2011/08/
       tn0003_metadata_in_pdfa-1_2008-03-128.pdf



Hansen, et al.          Expires December 28, 2014               [Page 9]


Internet-Draft                PDF for RFCs                     June 2014


   [6] http://www.pdfa.org/wp-content/uploads/2011/08/PDFA-in-
       a-Nutshell_1b.pdf

   [7] http://www.pdfa.org/2011/08/pdfa-%E2%80%93-a-look-at-the-
       technical-side/

   [8] http://pdf.editme.com/pdfa

Appendix A.  A Synopsis of PDF Format History
<74>
   [RFC3778] contains some history of PDF.  This is a capsule view, plus
   additional information on events that have occurred since the
   publication of [RFC3778].  NOTE: currently doesn't talk about the
   handoff of change control to ISO and the evolution as an ISO standard
   32000.  Plans are to update the application/pdf MIME registration to
   include this information, and then point to that.
<75>
   The Portable Document Format (PDF) family of document formats was
   invented by Adobe Systems in the early 1990s.  At the time, it was a
   proprietary format that underwent a variety of revisions that matched
   the release of different versions of the Adobe Acrobat products.  For
   example, Acrobat 1 supported PDF version 1.0, Acrobat 2 supported PDF
   version 1.1, Acrobat 5 supported PDF version 1.4, etc.  [2]
<76>
   Each release (and extension level) introduced new features.  For
   example, (1.0) character, word and image rendering, externally-
   referenced or embedded fonts, (1.1) passwords, encryption, device-
   independent color, (1.2) interactive forms, unicode, signatures,
   compression, (1.3) web semantic capture, embedded files, Adobe
   javascript, (1.4) metadata streams, tagged PDF, (1.5) controllable
   hiding of sections, slideshows, (1.6) 3D artwork, OpenType font
   embedding, linking into embedded files, and (1.7) video and audio
   support.  After release 1.7, additional Extension Levels have been
   introduced.  Each release also provided enhancements to the previous
   support.  For example, encryption was introduced in 1.1, but AES
   encryption wasn't supported until 1.7 extension level 3.  A PDF
   reader for PDF 1.1 is not able to read and display a PDF 1.7 file,
   but a PDF reader for PDF 1.7 can also handle all previous versions of
   PDF.  The wikipedia page at [3] has a nice summary table going into
   further details.
<77>
   Certain profiles or subsets of PDF have been standardized.  PDF/X (X
   for Exchange), PDF/A (A for Archive), PDF/E (E for Engineering), PDF/
   VT (VT for Variables and Transactions), and PDF/UA (UA for Universal
   Access) all have ISO standards associated with them.  Of particular
   potential interest to the RFC community are PDF/A and PDF/UA.





Hansen, et al.          Expires December 28, 2014              [Page 10]


Internet-Draft                PDF for RFCs                     June 2014


A.1.  PDF/A
<78>
   PDF/A in turn has nuances, as there have been a couple updates to it
   and conformance levels within each version.  PDF/A-1 was based on PDF
   release 1.4.  PDF/A-2 was based on PDF release 1.7, and PDF/A-3 adds
   embedded arbitrary files.  PDF/A is considered a profile because it
   mandates that certain optional features be used.  At a high level,
   the conformance levels are B (basic), U (mandatory unicode mapping
   [not in PDF/A-1]) and A (accessible).  The requirements for
   conformance level A are that: the document structure must be
   represented within the PDF (e.g., section headings, table cells,
   paragraph divisions), tagged PDF is used (e.g., element anchors) and
   that language tags be used where appropriate.  When referring to PDF/
   A, you would refer to the version and conformance level.  So PDF/A-1A
   would be the profile for the Accessible conformance level of version
   1 of PDF/A, which was based on PDF 1.4.

A.2.  PDF/UA
<79>
   The PDF/UA (Universal Access) profile is orthogonal to the other
   profiles, specifying user accessibility requirements.  It places some
   restrictions on the other profiles, such as requiring the use of
   higher-level constructs for the textual representation and adds
   additional requirements for programatic access (think automatic
   readers for the blind).

A.3.  Additional Reading
<80>
   [4] [5] [6] [7] [8]

Authors' Addresses

   Tony Hansen (editor)
   AT&T Laboratories
   200 Laurel Ave. South
   Middletown, NJ  07748
   USA

   Email: tony+rfc2pdf@maillennium.att.com












Hansen, et al.          Expires December 28, 2014              [Page 11]


Internet-Draft                PDF for RFCs                     June 2014


   Larry Masinter
   Adobe Systems
   345 Park Ave
   San Jose, CA  95110
   USA

   Email: masinter@adobe.com
   URI:   http://larry.masinter.net


   Matthew Hardy
   Adobe Systems
   345 Park Ave
   San Jose, CA  95110
   USA

   Email: mahardy@adobe.com


































Hansen, et al.          Expires December 28, 2014              [Page 12]


Html markup produced by rfcmarkup 1.129b, available from https://tools.ietf.org/tools/rfcmarkup/