small medium large xlarge

How Do We...?

How Gerbils Make Sausage

by Dave Thomas

Generic image illustrating the article
  People often ask how we do what we do. This series explains…  
This month's question: How do you create book-specific markup?

As we discussed last month, one of our goals is to use proper semantic markup for our books. This means that we have to accommodate books that have specific markup requirements.

Last month we looked at the technique we use to allow what is effectively subclassing of our book DTDs—the specifications of the markup permitted for each book. Using this, we can have a single master DTD for all the regular markup and per-book DTDs for the book-specific stuff. This allows us to validate each book is syntactically correct during the build. But a book’s no good unless you can read it, so this month we’ll look at the next step—how we support per-book specialized formatting. This is what lets us convert the book-specific markup into the correct representations for paper books, PDFs, and the various eBook formats.

We convert our books from XML into the required output format using XSLT. XSLT is a processor that takes an XML document and transforms it into another form. We use it to generate XML, HTML, and TeX versions of our books. XSLT works by reading a set of pattern-matching transforms, and then applying appropriate transforms to a source XML document.

Now, the Rails book talks a lot about key/value pairs—elements of Ruby hashes. After I wrote these inline a couple of times, I realized that I really should be using semantic markup for them. So I added a new PML element just for the book. You write “Saying <hashentry key=":id">product</hashentry> is idiomatic” which renders as “Saying :id => product is idiomatic.”

So how do we make this particular transform local to just the Rails book? We actually use something vaguely reminiscent of Ruby’s mixins.

Each book’s source tree contains a set of XSLT transforms, one for each of our target formats. When we first set up a title, the transform (for example) for the mobi format is trivial:

 <xsl:stylesheet xmlns:xsl=""
  <xsl:import href="../../../../../Shared/xml/ppb2mobi.xsl"/>

All it does is load up the global set of mobi transforms. In effect, the default transforms are mixed into the book-local XSLT file.

Then, when we want to add new markup, we can simply add the transform to the book-local file, somewhere after the mixin. (Apologies for the low-level HTML—the Kindle doesn’t support stylesheets.)

 <xsl:stylesheet xmlns:xsl=""
  <xsl:import href="../../../../../Shared/xml/ppb2mobi.xsl"/>
  <xsl:template match="hashentry">
  <code><xsl:value-of select="@key"/></code>
  <xsl:text> </xsl:text>
  <xsl:text> </xsl:text>

We use TeX to create our printed books and DRM-free PDFs. In this case, the XSLT transforms the XML to TeX source, making use of macros in our TeX stylesheet.

 <xsl:template match="hashentry">
  <xsl:call-template name="scape">
  <xsl:with-param name="string">
  <xsl:value-of select="@key"/>

Because the TeX macro \hashentry is not one of our standard ones, we have to define it locally for the book. For TeX, we use a slightly different approach to get book-local functionality.

TeX supports the idea of a load path—a list of directories that TeX will search when loading files. All of our regular TeX macro packages are stored in a shared directory that appears last in this search path. And many of these macro packages are simply empty placeholders, loaded up during formatting at strategic places but having no effect on the build.

But… we include a book-local directory close to the front of TeX’s search path. If there’s a file in that directory with the same name as a global one, TeX will load it in preference. And that lets us add book-local TeX formatting. We just put formatting macros into a file called local.tex in that directory, and those macros are available during that book’s build. For the \hashentry macro, the file contains:


(The \, sequence is a narrow space, which formats the entry nicely.)

Dave Thomas is one of the Pragmatic Programmers.