People often ask how we do what we do. This series explains…
Just as with our books, these magazines are written in plain text, using PML, our XML-based markup language. And, just like the books, we collaborate by checking the various components in and out of our version control system.
We use Jim Weirich’s Rake utility to coordinate all the tasks associated with building and distributing the content. Typing rake magazine.pdf, for example, builds the PDF version—you can build epub and mobi versions too. Once we’re happy with an issue, rake upload creates all three versions and uploads them to S3 for your reading pleasure. Other Rake tasks do things like generate cover images for the online site, create a Textile format table of contents for the online description, and so on.
Our PML markup is simple but flexible—common stuff is easy and relatively terse. Here’s the start of the body of this article:
We can extend the markup to add custom tags for any issue (in the same way we can extend it for each book). In this issue, for example, we needed to draw a Sudoku diagram. We added markup that let us do:
From PML to PDF (and the rest…)
The first step in creating a readable document is to deal with any source code in program listings. A preprocessor (written in Ruby) scans the PML files, combining them into a single XML document. Along the way, it looks for our special code inclusion tags, finds the appropriate source files, syntax highlights them, and inserts the result as valid XML into the combined document.
Once we have this combined document, we create PDF, mobi, and epub files from it. Let’s look at PDF first.
For the print books, we currently use XSLT to take this document and convert it to TeX. On the TeX side, we use a combination of the MEMOIR package and a whole bunch of our own style macros to create the final product. For the magazines we do it differently—we’re experimenting with using xsl:fo. A XSLT transform takes our XML and converts it to flow objects, which then gets rendered into PDF. Right now we’re using RenderX, which does a nice job of PDF generation.
For mobi files (used by the Kindle), we use a different XSLT transform to convert our XML into a very simple (and very nonstandard) HTML. This includes some special tags and id attributes that tell the Kindle reader things like the location of the table of contents. This HTML then goes through the html2mobi utility, part of the MobiPerl package. (We're also looking at using Calibre in future). One of the joys of the Kindle is that the internal markup doesn't seem to be published anywhere, so there's a whole bunch of experimentation and guessing involved.
For epubs, we use a third XSLT transform to create an IPDF compliant directory tree containing the various OPF files and content. However, we've noticed that feeding epub readers a document built using a single HTML document bogs them down terribly, so before we package the content, we run a utility that uses the Ruby NokoGiri library to split the master document into multiple sections, rebuilding the various resource indexes to reflect this new structure. (We originally did this splitting in the XSLT, the way the DocBook transforms do, but this is horrendous as the splits are not hierarchical. If you want to see some unbelievably complex XSLT, look at the way DocBook creates epubs.)
In the middle of these steps, we also use a Ruby program to convert the image files to a format suitable for the target readers.
Of course, this is what we did today. Tomorrow, it'll probably be different.
Dave Thomas is one of the Pragmatic Programmers.