small medium large xlarge

How Do We...?

How Gerbils Make Sausage

by Dave Thomas

Generic image illustrating the article
  People often ask how we do what we do. This series explains…  
This month's question: How do you create book-specific markup?

We create our books using a custom XML-based markup language called PML (Pragmatic Markup Language, natch). We then process this to create the printed, PDF, epub, and mobi versions of our books. (When we remember, we also use the same source to update the book’s table of contents in our online store.)

Each book project starts out with the standard set of markup tags. We try hard to use semantic markup, rather than layout markup, so authors use tags such as <methodname> and <keyword> rather than <code> and <b>. And we encourage authors to try to work with that markup: over the years, it has grown to encompass most of the things we need to create our books. But every now and then an author needs something special—something not covered by the standard markup. For example, when writing Rails Recipes, Chad Fowler wanted to be able to write Arabic text for an example that illustrated internationalization.

So, the question is: how do we support custom markup for each individual book, while still keeping the whole build system standardized? It all comes down to some careful management of search paths, combined with some cool XSLT hacks.

Every book starts out as a clone of a single master book project in our repository. When authors check their books out, they find two main top-level directories. One, called Book, is where they do their work. The other, called PPStuff, is a read-only version of our toolset. The PPStuff files are never changed for a particular book—all the per-book customization occurs below the Book directory.

Let’s start with the DTD, as this defines the markup that’s supported for a particular book. Each book defaults to using the standard entities and schema that we provide, but they also get to override them and add their own. We do that using a trick that effectively lets us subclass the DTD for each book.

A couple of levels deep in the Book tree you’ll find the DTD that our tools use. It looks something like this:

 <!-- add extra entities here -->
 <!ENTITY % PPBookDTD SYSTEM "../../../PPStuff/util/xml/ppbook.dtd">
 <!-- add extra elements here -->

The two lines in the middle load up the global DTD from the shared PPStuff tree. That’s where the standard markup is defined. But, if the DTD in the Book tree defines entities before it includes the master DTD, those entities will take precedence—the local DTD can override our normal entities. Similarly, any elements defined after the master DTD is included will take precedence over elements in the master DTD.

You can use these two facts together to allow the document structure to be redefined. For example, the master DTD defines the structure of tags that can appear in normal text flow using something like this:

 <!ENTITY % local.char.flow "">
 <!ENTITY % char.flow
  | acronym
  | author
  | xmltag
  | xmltagpair
  | xref

A tag that includes a character flow (like the <p> tag, which formats a paragraph), is defined in terms of this character flow entity:

 <!ELEMENT p ( %char.flow; )*>

The master DTD defines the local.char.flow entity to be empty. When it is included in the char.flow entity, it has no effect. But Chad’s book needs the ability to mark up Arabic text inside a paragraph. So in its book-local DTD, you’ll find:

 <!ENTITY % local.char.flow
  "| arabic
  | idxmethod
 <!-- load standard PPBook -->
 <!ENTITY % PPBookDTD SYSTEM "../../../PPStuff/util/xml/ppbook.dtd">
 <!ELEMENT arabic ( %char.flow; )* >

By defining local.char.flow before the definition in the master DTD, Chad’s DTD inserts <arabic> as a valid tag inside paragraphs. It then defines the content of the tag itself after including the master DTD.

Although this sounds complex, it actually gives us a lot of flexibility. The master toolset is never changed for a particular book, which means we can share it across all of the titles under production. If we want to make a global change, we change it in one place, and all books get the update. At the same time, each individual book has hooks into the toolchain to allow it to add customizations.

Next month we’ll see how those customizations work when it comes to formatting the book itself.

Dave Thomas is one of the Pragmatic Programmers.