From seed to full bloom, Ambrose takes us through the steps to grow a domain-specific language in Clojure.
Lisps like Clojure are well suited to creating rich DSLs that integrate seamlessly into the language.
You may have heard Lisps boasting about code being data and data being code. In this article we will define a DSL that benefits handsomely from this fact.
We will see our DSL evolve from humble beginnings, using successively more of Clojure’s powerful and unique means of abstraction.
Our goal will be to define a DSL that allows us to generate various scripting languages. The DSL code should look similar to regular Clojure code.
For example, we might use this Clojure form to generate either Bash or Windows Batch script output:
Input (Clojure form):
Output (Bash script):
Output (Windows Batch script):
We might, for example, use this DSL to dynamically generate scripts to perform maintenance tasks on server farms.
Baby Steps: Mapping to Our Domain Language
I like Bash, so let’s start with a Bash script generator.
To start, we need to expose some parallels between Clojure’s core types and our domain language.
So which Clojure types have simple analogues in Bash script?
Strings and numbers should just simply return their String representation, so we will start with those.
Let’s define a function emit-bash-form that takes a Clojure form and returns a string that represents the equivalent Bash script.
The case expression is synonymous here with a C or Java switch statement, except it returns the consequent. Everything in Clojure is an expression, which means it must return something.
Now if we want to add some more dispatches, we just need to add a new clause to our case expression.
Echo and Print
Let’s add a feature.
Bash prints to the screen using echo. You’ve probably seen it if you’ve spent any time with a Linux shell.
clojure.core also contains a function println that has similar semantics to Bash’s echo.
Wouldn’t it be cool if we could pass (println "a") to emit-bash-form?
At first, this seems like asking the impossible.
To made an analogy with Java, imagine calling this Java code and expecting the first argument to equal System.out.println("asdf").
(Let’s ignore the fact that System.out.println() returns a void).
Java evaluates the arguments before you can even blink, resulting in a function call to println. How can we stop this evaluation and return the raw code?
Indeed this is an impossible task in Java. Even if this were possible, what could we expect do with the raw code?(!)
System.out.println("asdf") is not a Collection, so we can’t iterate over it; it is not a String, so we can’t partition it with regular expressions.
Whatever “type” the raw code System.out.println("asdf") has, it’s not meant to be known by anyone but compiler writers.
Lisp turns this notion on its head.
Lisp Code Is Data
A problem with raw code forms in Java (assuming it is possible to extract them) is the lack of facilities to interrogate them. How does Clojure get around this limitation?
To get to the actual raw code at all, Clojure provides a mechanism to stop evaluation via the tick. Prepending a tick (aka quote) to a code form prevents its evaluation and returns the raw Clojure form.
So what is the type of our result?
We can now interrogate the raw code as if it were any old Clojure list (because it is!).
This is a result of Lisp’s remarkable property of code being data.
A Little Closer to Clojure
Using the tick, we can get halfway to a DSL that looks like Clojure code.
Let’s add this feature to emit-bash-form. We need to add a new clause to the case form. Which type should the dispatch value be?
So let’s add a clause for clojure.lang.PersistentList.
As long as we remember to quote the argument, this is not bad.
Multimethods to Abstract the Dispatch
We’ve made a good start, but I think it’s time for some refactoring.
Currently, to extend our implementation we add to our function emit-bash-form. Eventually this function will be too large to manage; we need a mechanism to split this function into more manageable pieces.
Essentially emit-bash-form is dispatching on the type of its argument. This dispatch style is a perfect fit for an abstraction Clojure provides called a multimethod.
Let’s define a multimethod called emit-bash. Here is the complete multimethod.
A multimethod is actually fairly similar to a case form. Let’s compare this multimethod with our previous case expression. defmulti is used to create a new multimethod, and associates it with a dispatch function.
This is very similar to the first argument to case.
defmethod is used to add “clauses,” known as methods. Here java.lang.String is the “dispatch value,” and the method returns the form as-is.
This is similar to adding clauses to our case expression.
Notice how the multimethod is like a more flexible case expression.
We can put methods wherever we like; anyone who can see the multimethod can add their own method from their own namespace. This is much more “open” than a case form, in which all clauses are required to be in the same code form.
Notice how this compares to Java inheritance, where modifications can only occur in a single namespace, often not one that you control. This common situation highlights some advantages of separating class definitions from implementation inheritance.
Compared to case, multimethods also have an important advantage of being able to add new dispatches without disturbing existing code.
So how can we use emit-bash? Calling a multimethod is just like calling any Clojure function.
The dispatch is silently handled under the covers by the multimethod.
Extending our DSL for Batch Script
Let’s say I’m happy with the Bash implementation. I feel like starting a new implementation that generates Windows Batch script. Let’s define a new multimethod, emit-batch.
We can now use emit-batch and emit-bash when we want Batch and Bash script output respectively.
Comparing the two implementations reveals many similarities. In fact, the only dispatch that differs is clojure.lang.PersistentList!
Some form of implementation inheritance would come in handy here.
We can tackle this with a simple mechanism Clojure provides to define global, ad-hoc hierarchies.
When I say this mechanism is simple, I mean non-compound; inheritance is not compounded into the mechanism to define classes or namespaces but rather is a separate functionality.
Contrast this to languages like Java, where inheritance is tightly coupled with defining a hierarchy of classes.
We can derive relationships from names to other names, and between classes and names. Names can be symbols or keywords. This is both very general and powerful!
We will use (derive child parent) to establishes a parent/child relationship between two keywords. isa? returns true if the first argument is derived from the second in a global hierarchy.
Let’s define a hierarchy in which the Bash and Batch implementations are siblings.
Let’s test this hierarchy.
Utilizing a Hierarchy in a Multimethod
We can now define a new multimethod emit that utilizes our global hierarchy of names.
The dispatch function returns a vector of two items: the current implementation (either ::bash or ::batch), and the class of our form (like emit-bash’s dispatch function).
*current-implementation* is a dynamic var, which can be thought of as a thread-safe global variable.
In our hierarchy, ::common is the parent, which means it should provide the methods in common with its children. Let's fill in these common implementations.
Remember the dispatch value is now a vector, notated with square brackets. In particular, in each defmethod the first vector is the dispatch value (the second vector is the list of formal parameters).
This should look familiar. The only methods that needs to be specialized are those for clojure.lang.PersistentList, as we identified earlier. Notice the first item in the dispatch value is ::bash or ::batch instead of ::common.
The ::common implementation is intentionally incomplete; it merely exists to manage any common methods between its children.
We can test emit by rebinding *current-implementation* to the implementation of our choice with binding.
Because we didn’t define an implementation for [::common clojure.lang.PersistentList], the multimethod falls through and throws an Exception.
Multimethods offer great flexibility and power, but with power comes great responsibility. Just because we can put our multimethods all in one namespace doesn’t mean we should. If our DSL becomes any bigger, we would probably separate all Bash and Batch implementations into individual namespaces.
This small example, however, is a good showcase for the flexibility of decoupling namespaces and inheritance.
Icing on the Cake
We’ve built a nice, solid foundation for our DSL using a combination of multimethods, dynamic vars, and ad-hoc hierarchies, but it’s a bit of a pain to use.
Let’s eliminate the boilerplate. But where is it?
The binding expression is an good candidate. We can reduce the chore of rebinding *current-implementation* by introducing with-implementation (which we will define soon).
That’s an improvement. But there’s another improvement that’s not as obvious: the quote used to delay our form’s evaluation. Let’s use script, which we will define later, to get rid of this boilerplate:
This looks great, but how do we implement script? Clojure functions evaluate all their arguments before evaluating the function body, exactly the problem the quote was designed to solve.
To hide this detail we must wield one of Lisp’s most unique forms: the macro.
The macro’s main drawcard is that it doesn’t implicitly evaluate its arguments. This is a perfect fit for an implementation of script.
(That first ' should really be a backtick. The editor had a brainfreeze and couldn’t figure out how to get a backtick through the build system intact.)
To get an idea what is happening, here’s what a call to script returns and then implicitly evaluates.
It isn’t crucial that you understand the details, rather appreciate the role that macros play in cleaning up the syntax.
We will also implement with-implementation as a macro, but for different reasons than with script. To evaluate our script form inside a binding form we need to drop it in before evaluation.
(Again, that ' should really be a backtick.)
Roughly, here is the lifecyle of our DSL, from the sugared wrapper to our unsugared foundations.
It’s easy to see how a few well-placed macros can put the sugar on top of strong foundations. Our DSL really looks like Clojure code!
We have seen many of Clojure’s advanced features working in harmony in this DSL, even though we incrementally incorported many of them. Generally, Clojure helps us switch our implementation strategies with minimum fuss.
This is notable when you consider how much our DSL evolved.
We initially used a simple case expression, which was converted into two multimethods, one for each implementation. As multimethods are just ordinary functions, the transition was seamless for any existing testing code. (In this case I renamed the function for clarity).
We then merged these multimethods, utilizing a global hierachy for inheritance and dynamic vars to select the current implementation.
Finally, we devised a pleasant syntactic interface with a two simple macros, eliminating that last bit of boilerplate that other languages would have to live with.
I hope you have enjoyed following the evolution of our little DSL. This DSL is based on a simplified version of Stevedore by Hugo Duncan. If you are interested in how this DSL can be extended, you can do no better than browsing the source code of Stevedore.
Ambrose Bonnaire-Sergeant is a Computer Science student at the University of Western Australia. He is passionate about functional languages, Clojure being his current favourite. In his spare time, Ambrose likes to learn new programming languages, play his Clarinet and sing in local Choirs. If you are in Western Australia and are looking to start a Clojure or Functional Programming User group, you can contact Ambrose at email@example.com.
This article was written in Vim using Meikel Brandmeyer’s VimClojure plugin. See more of Meikel’s work here.