How a tax form kludge gifted the world 25 joyous years of PDF
A worthy lesson in nifty programming and embracing standards
HTML is the world's most common digital document file format. However, it's not the one everyone turns to when they want to create a precise document that looks, prints and behaves the same on any platform on any device. And it's hardly the format of choice for immediate offline reading, easy sharing or simple portability.
For that, you need PDF.
All Adobe Acrobat product boxes from version 1 to DC
The Portable Document Format, first unleashed by Adobe upon a baffled public exactly a quarter century ago, revolutionised documentation, publishing, archiving and prepress, and brought everyday organisations closer to the "paperless office" than XML ever did. And yet it has only just recently been upgraded to version 2.0.
It all began with a demo that went wrong, according to Adobe co-founder John Warnock.
The full history of how and why Steve Jobs introduced typography into computing is the stuff of Apple fanboi legend. The bit that concerns us here is Jobs' decision to embed Adobe's fledgling PostScript page description language into Apple's first laser printer, the LaserWriter, in 1985.
Naturally, both Apple and Adobe wanted to show off the LaserWriter's abilities with some sample files. Warnock thought it would be great to have the printer producing something complex but familiar so he personally hand-programmed an IRS tax form in PostScript. At the time, PostScript was still text-based and a bit clunky, and all those boxes and form fields required Warnock to code in lots of subroutines and utilities.
The page took 2 minutes 45 seconds to print. Jobs was horrified at how slow that was.
Looking over his lines of PostScript later, Warnock tried a different tack. As he recounted to digital prepress newsletter The Seybold Report in 2001: "You can redefine the operators to have different semantics than the original operators. So I took all of the basic graphics commands in my original program, reprogrammed just a capture of their parameters and wrote those parameters out to a file." This effectively "flattened" the file out.
Acrobat 3.0, released in 1996, upgraded the format to PDF 1.2. It has been out of kilter ever since. If you think this is intuitive, then you'll already realise that Acrobat 8.0 creates PDF 1.7 files – as well as 1.6 and previous versions, as indeed does Acrobat XI but definitely not Acrobat 7.0...
His new version of the IRS tax form printed on the LaserWriter in 22 seconds.
This flattening technique became known on the PostScript team as a "graph binder" but it remained a dormant technology within Adobe for several years afterwards. All Warnock was interested in at the time was to ensure that PostScript-based laser printers sold well. Adobe went on to experiment with other uses for PostScript, including as an imaging model within a graphics engine system called Display PostScript (DPS) but no one showed much interest in it except for Steve Jobs, who built it into his NeXT computers.
By 1991, with the proliferation of office networks, Warnock began thinking about using the graph binder to flatten general document files so that they would be easy to interpret and render across platforms. They wouldn't even need the full PostScript language.
He wrote a paper on how this would work, naming the project "Camelot". Reading the text today, you can't fail to note how its tone evokes the spirit of the age, at a time when the internet was strictly available only to academics, the military and a handful of dedicated nerds. Referring to the technology as "Interchange PostScript" (IPS), Warnock conjures a bright new future that breaks beyond the limitations of fax machines "to produce remote paper" and boasts that PostScript "has been implemented on over 100 commercially available printer products".
"Imagine being able to send full text and graphics documents (newspapers, magazine articles, technical manuals etc.) over electronic mail distribution networks. These documents could be viewed on any machine and any selected document could be printed locally. This capability would truly change the way information is managed. Large centrally maintained databases of documents could be accessed remotely and selectively printed remotely. This would save millions of dollars in document inventory costs."
Warnock presented the concept in public at that year's Seybold Conference in San Jose, by which time the name "Camelot" had been changed to "Carousel". One can only imagine the relief of Adobe's French-language promoters in later years to know how close they came to having to sell a product whose name was slang for "junk".
Fired up by positive feedback, he established a small team of programmers who successfully produced a prototype engine that could quickly render to a display the files that had been flattened by the graph binder.
He then took on two key computer scientists to fine-tune the system. Doug Brotz was charged with finding a way for the graph binder to work within a PostScript interpreter and deal with all of Adobe's font technologies at the time (Type 1, Type 2 and Type 3). Peter Hibbard was given the challenge of devising a flexible and extensible file format, which he did with COS – the foundation for the PDF language and data structures.
Quite possibly the first software package in the world to provide Help in a PDF file
The fruit of their labour was announced at Comdex Fall in 1992, winning a Best of Comdex award.
Adobe launched it as a commercial product on 15 June 1993. It had undergone another name change and was now a package of utilities with the moniker "Acrobat". The pack principally comprised a virtual printer driver for generating PostScript files from any application, Acrobat Distiller for converting the PostScript to the new interchange format, Acrobat Exchange for converting, viewing and printing these files, and Acrobat Reader for viewing and printing only. The interchange file format itself was no longer called IPS but "PDF" – the Portable Document Format.
Warnock's choice of Seybold as a venue for announcing his idea suggests he had a rather different concept of Acrobat's target market to Adobe's subsequent corporate view upon launch. Seybold was a conference for the publishing industry, after all, but Adobe's product marketing seemed to focus on businesses looking to achieve "the paperless office" as this launch video – complete with big-haired, '90s-necktied yuppie types – makes plain:
Indeed, the promo text on early product boxes seemed to present a very different scenario to Warnock's predicted publishing revolution, preferring instead to hail Acrobat as a means of sharing documents you create in the market-leading WordPerfect 5.1 with colleagues foolish enough to be using some other silly word processor (such as, oh I dunno, the fledgling pretender Microsoft Word). PDF files could include internal hyperlinks and bookmarks, and even contain embedded fonts – crucial to the exchange format's core functionality.
Basic commenting in PDF 1.2 allowed everyone involved in a document sign-off round-robin to have a pop via email attachment
PDF did not set the world alight that year. Nor the following year. In fact, PDF would struggle to be regarded as much more than a curiosity until the late 1990s, no thanks to Adobe's pricing strategy: a Personal edition of Acrobat in 1993 cost $695, the Network edition coming in at $2,495. Colleagues who just wanted Acrobat Reader to view and print your PDF files were charged $50.
Also hurting Adobe's pitch was the way the company rapidly developed the capabilities of PDF with each subsequent upgrade to the Acrobat package, especially in its early years. While it is common enough for software application updates to demand updated file formats to support them, with Acrobat and PDF the process became relentlessly and unforgivingly one-way.
Worse for the marketing bods, someone somewhere at Adobe didn't think it worth synchronising version numbering between the application and file format. So when Acrobat 2.0 came out in November 1994, adding functionality such as external links, notes and basic password security, the file format was labelled PDF 1.1. Acrobat 3.0, released in 1996, upgraded the format to PDF 1.2. It has been out of kilter ever since.
If you think this is intuitive, then you'll already realise that Acrobat 8.0 creates PDF 1.7 files – as well as 1.6 and previous versions, as indeed does Acrobat XI but definitely not Acrobat 7.0 – but most normal people without such personality disorders find the system utterly incomprehensible without reference to a wall chart and a stick.
It was PDF 1.2 that nudged the publishing industry awake, as Warnock had predicted. This version of the file format added support for the CMYK process colour space alongside additional "spot" ink channels, plus other prepress-specific specifications such as OPI (where low-resolution image placeholders were automatically replaced by their high resolution versions during film output), halftone functions and overprint support.
Prepress software developers began writing plugins for Acrobat; PDF support began being added to popular raster image processor (RIP) utilities for film-making; small bureaus and big press companies alike began getting excited about the potential of a file submission format that didn't involve habitually phoning customers up to tell them they'd forgotten to send all the picture files or were using a font that nobody had heard of.
Enhanced multi-level internal links in PDF 1.3 transformed Bookmarks into fully featured tables of contents
A consortium of prepress organisations convened on Ghent in Belgium in 1998, hammered out which PDF export settings were most suitable for press output and announced it as an industry standard called "PDF/X". The meaning of the X is lost to time but probably has little more significance than the fact that it was fashionable to put an "X" in company and product names in the late 1990s.
Still, Adobe did not have the file interchange market all to itself. The second half of the decade saw the rise of many challengers to PDF, most memorably No Hands Software's Common Ground, WordPerfect's Envoy and AT&T's DjVu. Adobe's response was to cut its losses by ditching Acrobat's Unix edition, giving away its Adobe PDF Reader utility for free and focus on the two areas in which it had enjoyed success: prepress, as outlined above, and the emerging World Wide Web.
You see, Acrobat 3.0 had sneakily introduced a Reader plug-in for the leading web browser of the day, Netscape Navigator. Either by accident or design, Adobe had stumbled upon one definition of a computing ’standard’: something that everyone else is using already. PDF became an early web standard while still being a proprietary format.
PDF 1.3, introduced in April 1999, nailed it for the print industry with smooth blends, improved font, colour space and OPI support, and increasing the maximum page size to 5 metres. By 2001, the industry had agreed upon an updated PDF/X-1a format based on PDF 1.3, upon which the vast majority of printing companies insist to this very day. For office workers, it added an annotation layer in which users could insert corrections and comments.
If the file format was a hit, Adobe’s upgraded Acrobat package most definitely wasn’t. Despite the Windows edition adding Microsoft Office integration and a nifty function for slurping entire web sites into a single multipage PDF, Acrobat 4.0 was a bug-riddled hell-hole. In a hilarious Oracle-style fit of cheek, Adobe then tried to sell the bug-fix as version 4.05 before backing down and sending it out free to registered users – four months later, if you were in Europe.
Acrobat 5.0's web capture utility was available for Windows only for a painfully long time. Adobe hinted that Mac users weren't buying enough copies of Acrobat to make it worth their while
With the turn of the Millennium, Adobe finally cracked a rendering problem it had been struggling with since the beginning. PDF had always been based on PostScript, and PostScript never supported anything but rigorously opaque text, pictures and colours. PDF 1.4 at last freed itself of the PostScript legacy and could support native transparency – that is, without faking it by converting clever on-screen transparent effects into a kludgy flat jigsaw of re-rendered image blocks.
For a while, only Adobe Illustrator 9 could export to PDF 1.4 but the eventual release of Acrobat 5.0 in May 2001 revealed other surprises that were to establish it as an everyday office file format. You could create, distribute and complete forms as PDFs; ‘tagging’ meant you could build logical structure into the content; MS Office integration was enhanced; Acrobat itself even started looking like MS Office.
Adobe then continued to tweak the format for a while. In 2003, PDF 1.5 added support for layers, better tagging and better file compression. In 2005, PDF 1.6 could act as a container for other files in other formats (PDF Portfolio), include 3D data and embed native OpenType fonts.
Adobe then began to lose interest in having to keep cajoling users into accepting into new versions of the format, with only relatively modest enhancements to security and commenting being added to PDF 1.7 in 2006. Acrobat 8.0 even maintained PDF 1.6 as its default format, with 1.7 offered as an option, recognising that each incremental format update had the effect of pumping up the file size with support for smart features that maybe not all users appreciated or even noticed.
If you want to make your PDF files smaller, Adobe suggests ditching the bloat that came with later versions of PDF
In January 2008, the company handed the format over to the International Standards Organisation, where it is known by the catchy title ISO 32000-1:2008.
For most people, this is when the PDF revolution really began. Prior to this, you'd send a PDF into the ether and hope that recipients would have an appropriate PDF Reader capable of opening it. To its credit, the ISO did the single most important thing to PDF that's essential to establishing any popular standard: absolutely nothing. Once Adobe had stopped buggering around with it, PDF became regarded as a stable and reliable format that software developers could feel confident in incorporating into their own products. Ten years of stability eventually put an end to annoying "You need to update Adobe Reader" messages and even the worst web browser on the planet – Microsoft Edge – can open PDFs directly without argument.
Adobe's apparently magnanimous gesture in relinquishing exclusive ownership has helped keep PDF ahead of modern XML alternatives from Microsoft and others, especially the OpenXPS format, in terms of popular use. Warnock is known to be dismissive of arguments that favour XML over PDF, arguing that since PDF is already an open format, compressed, structured, searchable and workflowable, there's no point in reproducing the same features in XML except to show off how clever you are.
Being around longer than the rest has also ensured PDF's adoption as a standard more deeply across certain industries and applications. As well as a growing set of PDF/X standards for prepress, there is PDF/VT for variable and transactional printing, PDF/UA for universal accessibility by people with disabilities, PDF/E for geospatial, construction and manufacturing document workflows, and PDF/A for long-term archival use.
Crucifixes at the ready ... Acrobat X Pro made this pretty PDF Portfolio interface possible with the aid of (eek) Flash
That said, the format is overdue a refresh to meet the needs of emerging digital applications that have developed since 2008. Content creators have long been demanding a version of PDF that supports embedded HTML5-based media, interactivity and animation, for example. However, PDF also needs to better meet the needs of assistive technologies natively with metadata, and this was the principal focus in the development of PDF version 2.0. The specification, published last summer with the name ISO 32000-2:2017, can be read here.
Ultimately, it is up to the ISO and Adobe as key advisor to keep PDF relevant, despite its long history. A format that's popular one day can decline in use very quickly for all sorts of reasons, and users can be very unforgiving. And as Adobe itself knows too well, support for a "standard" file format that once ruled the online media world can vanish in a (ho ho) flash. ®
Perhaps you will enjoy this Adobe-made promo video from 2015 of apparently genuine people making really bad guesses at what "PDF" stands for.
Sponsored: Becoming a Pragmatic Security Leader