small medium large xlarge

Thinking Functionally with Haskell

Playing with Haskell

by Paul Callaghan

Generic image illustrating the article
  Paul has introduced some of the key concepts of the Haskell language in the past two issues, and now puts them to work in an interesting exercise.  

The Word Chain Kata

It’s time to do some programming now, so let’s try a kata. We’re going to look at Dave Thomas’s Word Chain exercise. The objective is to find a series of steps from one word to another, where for each step a single letter is changed such that the resulting word is valid. For example, turning “lead” into “gold” with the path ["lead","load","goad","gold"]. For simplicity, I’ll assume four-letter words and lowercase throughout.

This problem is more “algorithmic” than “real world,” but it’s a useful step in our journey to larger use of Haskell or above—it is important to understand the core language and how to use it before we tackle anything more advanced. Plus, after last month, I think we all deserve a shorter article! [only marginally shorter!]

Where to start? This is often the hardest question. Personally, I like to start somewhere near the bottom and play around with ideas a bit, then start building upwards. (However: I must confess that I used a similar example in lectures for a few years so I know where it’s going. But I’ll try not to let this show too much....) As we go, we can reflect on what we’ve built and maybe refactor if or when we get some more insight about the problem.

Note that this isn’t a tutorial on programming in Haskell—we don’t have the time or space for that here, and there are several good references for this (as discussed below). Instead, I’m aiming to get you interested in some key ideas and give you something to think about.

Finally, remember the golden rule—it’s all about data, so the process will be all about identifying relevant data structures and transformations, and about decomposing big problems into smaller problems until we can solve them in a straightforward way. Remember the silver rule too—“trust your compiler,” so focus on clear code first before you worry too much about efficiency.

Development Tools and Language Version

I’m going to use the Glasgow Haskell Compiler (GHC) as the main development tool here. GHC is the most advanced implementation of Haskell, covering the standard language (Haskell-98) and many experimental extensions and libraries, including some based on cutting-edge research. As well as compiling Haskell to fast native code, it also provides a REPL-style tool called ghci.

Haskell-98 is a version of Haskell that was frozen around 1998, some 12 years after Haskell’s inception, to provide a stable version of the language and standard libraries that was suitable for education (including books and tutorials) and as a common target for all implementations of the language.

I won’t go into the history here, but recommend the paper "Being Lazy with Class" from 2007—by four of the main movers behind the language (Hudak, Hughes, Peyton Jones, Wadler)— as an excellent survey of Haskell’s genesis and development. For several of these articles, I will deliberately stick to Haskell-98 because of the standardization and because we won’t need any advanced features for a while.

For reference, the Haskell Platform provides an easy-to-install package of the main Haskell tools for standard platforms, including GHC and GHCi, Cabal (similar to Ruby’s Gem tool) and other build tools, plus many of the popular extra libraries (cf. Gems) already installed. Such additional libraries can be found on Hackage. Finally, there are some IDEs for Haskell, but I use vi.

Another option for Haskell-98 is the Hugs system: this is a smaller implementation and very useful for teaching: I used it in most of my classes, and still use it for basic programming.

The full code for this article is in a Github Gist. To play with the code, save it to a file with a .hs extension and run ghci on the file (or hugs, if you’re using that), then run various bits of code as you wish.

One Step at a Time

We’re looking to construct paths, and paths contain steps, so how about starting with the step operation? We want this to take a word and return all of the valid words that are reachable from the original word. For example, from “lead” we want to get the list


(the actual list depends on the dictionary you are using), but also to exclude non-words like “xead”.

Now, a key idea in functional programming is to split big problems into smaller ones, so how can we apply this here? We can guess that at some point we need to use the dictionary to filter out invalid words, but does the filtering need to be tied into the other parts of word generation? One common way to split problems is via the “generate and test” pattern, where one component suggests results and the other one discards invalid ones. (In my experience, undergrad programmers are experts in this pattern, although they tend to use it as a methodology—but that’s another story.) So, we can try having one component to generate a list of candidate words, and another to filter out the bad candidates.

A quick thought about types first: our basic step needs to have the type String -> [String], or from a word to a list of words. You might be itching to use more complex types after last month’s article (hurrah!) but let’s keep it simple for now. What do we have now? Let’s write it down.

 one_step :: String -> [String]
 one_step = filter is_valid_word . generate_candidates

Of course, this won’t compile because some code is missing (aka we have at least one failing test).

What does this code mean? The definition uses function composition (the dot), and says directly that the one_step operation is a composition (or combination) of generating candidates and filtering them. Recall that f . g is defined in the Haskell Prelude as \x -> f (g x), so the composition is a new function (the \x -> ... bit) that applies f to the result of g x. Programming-wise, we’re building a bigger transformation from two smaller ones.

You can read function compositions in any direction, whatever makes more sense to you. For example, right to left as a series of steps; i.e., generate the candidates then filter them; or left to right for a more “transformational” view: filtering the results from the candidate list. Notice that we don’t need to mention any parameters here: partly, we don’t need to (since the definition of function composition handles that for us), and partly we want to say that the big operation is a composition of two others and that’s all we need to say.

I’m going to leave word filtering for a bit, so I will just define the following and move on to the other missing code.

 is_valid_word = error "Need to finish"

The Prelude function error :: String -> a takes a string and causes a run-time exception when the expression is evaluated. That is, if the program ever tried to use is_valid_word then there would be a run-time error whose message included the "Need to finish" string. This function is useful for stubbing out unfinished code, and for signalling serious problems at run time. (It is a bit like throw for exceptions, and indeed, some exception support is built into Haskell—though not often used because we have other, softer alternatives.)

Note the type of error: it takes a string and the result type is a but there’s absolutely no constraint on a, i.e. it really does mean anything. This is sensible, because it allows us to flag errors anywhere, for example we can use it where a list of Ints is expected because the type variable matches anything. Plus, if you think about it, there’s no sensible operation that could have this type so it kind of has to be this error functionality!

Telling It Like It Is

Before we move on, notice that I used the expression filter is_valid_word rather than some other single (yet to be defined) function name. There are some important points about language use here. Firstly, I’m making the positive choice that I want to use filtering on the candidate list, because I know what data structures are in play at the time, and have decided that filtering is going to be appropriate. So instead of a simple decomposition (into two stages) I’ve gone a bit further on the detail.

Secondly, I’m using the language to say this directly rather than making up a new function name and adding another definition of that name. Compare these expressions: map (\x -> 2 + x) [1..10] and map add_two [1..10]—which is clearer? My point here is that Haskell is flexible enough that quite often, the code itself is the clearest explanation of the programmer’s intent, and adding in extra definitions can make the code harder to follow; e.g., you would have to check elsewhere in the code to see what add_two actually does. Of course, there are cases where such new definitions can be justified—but Haskell gives you the choice. (You can be even more terse, like map (2+) [1..10] if that’s your thing.)

Thirdly, what does the expression filter is_valid_word actually mean? Conceptually, it’s taking the general idea of filtering a list with some predicate and then fixing the predicate to be a certain test: the result is an operation that picks out the valid words from a list. The important point here is that in Haskell, this kind of expression is meaningful and hence a key part of the coding style. Let’s consider a simpler example.

 foo = filter (\x -> reverse x == x)
 bar = foo ["ada", "fred", "bob"]

The first line above names a combination of filter and a test for a list being a palindrome. (Haskell Strings are a list of characters—though this test for palindromes will work with a list of anything that has an equality test.) The combination can then be used with a list of words, as in the second line.

What’s the type of foo? Well, filter :: (a -> Bool) -> [a] -> [a]; i.e., it takes a predicate (takes some a value, returns a Bool) and a list of such a values and returns a list of the same. For the type of \x -> reverse x == x we have to do some type inference, but it comes out as Eq a => [a] -> Bool, i.e. testing a list of a values as long as we can do equality tests on a values themselves. Putting these together— which means matching the first parameter of filter with the predicate—we get filter (\x -> reverse x == x) :: Eq a => [[a]] -> [[a]], hence converting a list of list of a values into a similar list (assuming equality is known). A nested list of something might be a bit abstract if you are new to it, but it’s fine to think concrete examples, so in the context of bar it is going to take and return a list of strings.

Some people like to use the word “currying” at this point. I don’t—I think it’s a bit of a misleading distraction that adds little. Conceptually (and theoretically; i.e., the lambda calculus), all functions in Haskell take one argument at a time, and either return a new function or a concrete value, and it’s important to understand this. How filter behaved above is a direct consequence of this.

Pragmatically, we often define and use multi-argument functions as if they were just like their equivalents in (say) Ruby, but underneath, Haskell is still using the above mechanism. The compilers will optimize such multi-argument functions so they behave efficiently, so it makes little difference to actual performance. The key point is: we really don’t have to worry about it. Enjoy the flexibility instead!

Generating Candidates

Next, the operation that generates potential words for later filtering. Type-wise, we want generate_candidates :: String -> [String], so from a word we get all possible next steps. Is there a sensible possibility to make it polymorphic? It is always worth asking this, or else keeping the question in mind while coding. (There’s also a chance that something like this already exists in a library, though this is a bit “niche” and so it is unlikely. The Hoogle tool provides specialized search facilities over many Haskell libraries and so can help you check this.)

You need to know a bit about Strings first. Haskell strings are simple: just lists of characters, and these are lists in the Lisp sense—a list is empty or it is an element cons’d onto the front of a list. This means we can use all of the list functionality on strings, including map, filter, and the sugared versions of these known as list comprehensions.

Where do we start? Always good to start with some concrete examples, especially if the operation doesn’t seem obvious. From generate_candidates "" we expect no candidates, since there’s no way to change a single character in an empty string. From generate_candidates "a" we expect ["b", "c", ... "z"] because we change the character to every other character except "a". From generate_candidates "ba" we expect ["aa", "ca", "da", ... "za", "bb", "bc", ... "bz" ]—which is the list from changing the first character plus the list from changing the second character. We can encode these as test cases and there are several ways to do this—but I won’t do this now (that’s another article).

There’s a bit of a pattern here (which we should look to exploit in the code), but it’s not obviously a map or a filter or a fold, so the functional programmer’s usual next tactic is to start thinking about the main cases of the input. This means thinking about (a) the empty list case and (b) the cons case.

One benefit of thinking via patterns is that we can think about the cases independently, and looking at our examples above, we’ve already done the first case (there are no candidates) so let’s fill it in. This leaves:

 generate_candidates []
  = []
 generate_candidates (c:cs)
  = error "don't know"

The cons case is highly likely to involve recursion somewhere, and the trick of writing recursive code is to think about possible recursive calls and what they give you, and from the results how to construct what you actually want. In this case, we can call generate_candidates on any other string, but it’s likely to be the tail of the input (or maybe a substring of it). Let’s guess a call to generate_candidates cs (to produce all candidates from the tail string) and see where it gets us, specifically thinking about how we can use c, cs and the result from generate_candidates cs to build the result.

Let’s also think about the concrete examples for "ba" and "a" from above. The latter will be the result from the recursive call when processing "ba", so can we do something to ["b", "c", ... "z"] to get (part of) the overall result? Well, if we put character b at the front of each of the strings, we get the last part of the result list. Let’s fill this in.

 generate_candidates []
  = []
 generate_candidates (c:cs)
  = error "don't know"
  ++ [ c : xs | xs <- generate_candidates cs ]

Syntax [ expr | pattern <- list ] indicates a “list comprehension”, also seen in CoffeeScript, Python, certain branches of set theory.... It is a very convenient shorthand for combinations of maps and filters on lists and translates to a combination of map, filter and list concatenation. The code above produces a new list by looping through the elements produced by the recursive call and sticking head character c on the front of each—it’s a map in other words.

The first part of the result is not too hard either: it’s the result of putting any letter except for c at the front of the list tail cs. We can use a list comprehension here too, with a filtering step that skips the original character.

 generate_candidates []
  = []
 generate_candidates (c:cs)
  = [ x : cs | x <- ['a'..'z'], x /= c ] -- new at front
  ++ [ c : xs | xs <- generate_candidates cs ] -- old at front

Observe that when we’re dealing with the a one-character string, the recursive call will return an empty list, and mapping over it also produces an empty list—which is safe to append to the rest of the result. In short, we don’t have to handle this situation explicitly, it just naturally slots into place.

Is This Code OK?

We can test generate_candidates on the concrete examples and confirm that we get the expected outcome. Question for you: do we also need to test some more? It’s up for debate, and I don’t really have a definitive answer. You could nail it with dependent types, but pragmatically, is that overkill?

Also pause a while to consider whether the code makes sense, and whether it gives a clear enough view of how this part of the program works. We’ve reached this version of the code by a few careful steps of reasoning, filling in pieces at a time, and we did understand each step. Are we therefore confident that the code is correct, with or without testing? Is confidence alone enough? On the other hand, how can we be confident that our tests are sufficient?

What about alternative coding approaches though? Specifically, what would an imperative or OO programmer do? Would we be happier with these versions? One possibility is an outer loop on the string position and an inner loop on the replacement letters, generating a new candidate word on each iteration. Here’s a version of the code that works along similar principles.

 generate_v2 w = [ before ++ d : tail after
  | (before, after) <- all_splits_of w
  , not (null after)
  , d <- ['a' .. 'z']
  , d /= head after ]
 all_splits_of :: [a] -> [([a], [a])]
 all_splits_of w = zip (inits w) (tails w)

The all_splits_of function returns all ways of splitting a list into two while keeping the original element order; e.g., all_splits_of "lead" will give [("","lead"),("l","ead"),("le","ad"),("lea","d"),("lead","")]. We then loop over this list, replacing the head of the second part with a different character and rebuilding the string (except when the second part is empty). Is this clearer? Or can you find another version that does make more sense? TMTOWTDI.

It’s worth noting that this version touches on recent FP research on views: the idea is that we choose a view of the data that simplifies the code, rather than always working with a particular physical representation. (It has some relationship with the concept of views in databases.) So, we view a list as the join of two lists, and then loop through the various ways of splitting the list to get our answer. The FP work in views started with Wadler, and has been expanded significantly by McBride in the context of dependently typed programming (e.g. in Epigram).

There’s a strong flavor of using types to provide almost a DSL that makes the problem easier to solve. It’s an important idea—we’ll come back to it in later articles.

Dictionary Filtering

We’ll do this bit by loading a list of words into a hash-like structure and then testing whether a word is present. Here’s the code. It’s not a key detail here so I won’t explain it much.

 -- import lines should go at the top
 import Data.Set(Set, fromDistinctAscList, member)
 import System.IO.Unsafe(unsafePerformIO)
 dict = fromDistinctAscList
  $ lines
  $ unsafePerformIO (readFile "fours.txt")
 is_valid_word w = member w dict

The above loads a list of words from a file (which contains just four-letter words), splits the file into lines and then builds a lookup table from it. I’m using Haskell’s Set library, which uses efficient binary trees. The implementation isn’t important, as long as it gives reasonable performance. We can always switch to hash-based tables later if performance here ever becomes an issue. The key detail is that we can test word validity by testing membership of the set, as shown in the new definition of is_valid_word.

Syntactic detail: the $ operator is a neat trick to save writing too many right-nested parentheses, i.e. instead of f (g (h x)) one can write f $ g $ h x. You can think of $ as a kind of pipeline symbol, similar to how function composition is sometimes used, though it’s not building new functions or anything freaky. Its definition is f $ x = f x (boring) and the magic works entirely through the symbol being declared as a right associative operator. In contrast, division is declared as left associative; e.g., 1.0 / 2.0 / 3.0 is equivalent to (1.0 / 2.0) / 3.0 rather than 1.0 / (2.0 / 3.0). So this $ is just a useful bit of shorthand.

What’s that unsafe doing? It’s a Haskell programmer cutting a corner, basically doing some unofficial IO (the file read) at an arbitrary point in the code and turning the dictionary into a global variable. (What’s that sound? A fairy dying? No.) Normally, we should be careful about ordering IO actions so we don’t try to read before writing, etc., but this idiom is kind of excusable since the dictionary is used as a constant. (We’ll talk monads and IO at a later date. Bet you can’t wait.)

Breadth-first Searching

We can call one_step "lead" to get a list of next steps, and call one_step on any of the new words and so on. But, we’re after the shortest path between two words, and so we need a controlled way of exploring how steps are related in order to find such a path. The obvious thing is to try all one-step paths, then all two-step paths, then three, etc.

This kind of process is called a “breadth-first search” (BFS). I’ll show you a version of BFS that is phrased in terms of exploring a “state space.” This framework also makes it easier to experiment with other kinds of searching, like best-first search. The parent-to-list-of-children relationship suggests a tree, so let’s program it this way. (More generally, it could be a graph too, but let’s keep it simple for now.) Firstly, we need a type. Haskell’s standard library contains a module Data.Tree that provides a suitable type and some useful related functionality.

 -- contained in the Data.Tree module
 data Tree a
  = Node { rootLabel :: a, -- label value
  subForest :: Forest a -- zero or more child trees
 type Forest a = [Tree a]

So this is a tree with only one kind of node, and the node holds a value or label plus a list of zero or more child trees. The line type Forest a = [Tree a] defines a type synonym, providing a possibly more meaningful name for some type expression; i.e., a forest is a list of trees. The tree is (parametrically) polymorphic, of course. Concrete values look like this:

 eg1 = Node "foo" [Node "bar" []]
 eg2 = Node True [Node False [Node True []], Node True []]

The library contains drawTree :: Tree String -> String, which shows a simple text picture of the tree, so we can run putStr $ drawTree eg1 right away. How do we convert eg2? Well, we want to convert all of the Boolean values to strings, without changing the tree structure, and this sounds like a mapping operation. To cut a long story short, the Tree library implements the Functor interface too, which allows use of the overloaded fmap function for mapping over container types (like lists or trees), hence we can call putStr $ drawTree $ fmap show eg2.

Next, we’ll create the state space as a tree, where each node contains a state plus the trees that are reachable from that state. Notice the type first and what it describes: from a state-generating function and an initial state, we can produce the tree. It’s polymorphic too, because the tree-building process doesn’t care what’s in the tree. The implementation is straightforward: start a new tree at the current state and then collect the state space trees produced by all of the children. When a state has no children, there’s nothing to recurse on, hence no subtrees—just an empty list in the node.

 import Data.Tree -- put at top of file
 state_tree :: (s -> [s]) -> s -> Tree s
 state_tree children_of state
  = Node state [ state_tree children_of s | s <- children_of state ]

Now, you wouldn’t run this code directly in a conventional setting, but it’s fine in a lazy language because we’ll only need to generate enough of the tree to satisfy what the caller code requires—even if the tree’s depth was not finite. (If a state generated huge numbers of children, that would be an issue—but it’s an issue for other approaches too....)

Still, if we tried drawTree on an infinite tree, it would just keep on going or maybe run out of memory eventually. Let’s define another function to help us get sensible output: prune n t returns the structure from t down to a depth of n. There are two main cases, a depth of zero (or below) and a positive depth. When we hit zero, then truncate the tree by dropping its children. Otherwise, build a new tree based on the pruned child trees.

 prune :: Int -> Tree a -> Tree a
 prune d (Node x cs)
  | d <= 0 = Node x []
  | otherwise = Node x [ prune (d - 1) c | c <- cs ]

With this, code like putStr $ drawTree $ fmap show $ prune 3 $ state_tree (\s -> [s+1, s+1]) 0 now generates something useful. (Nothing significant here—it’s just a test value. The state generator here just says, from each state—which happens to be a number—it has two child states of that number with one added, hence it generates a tree that shows the depth at each level and has a fan-out of two children at each step.)

Now for BFS, we need to go through the state tree in breadth-first order. To keep things general, i.e., free from any application-specific details, we’re going to write a traversal function (to visit all of the states in order) and later filter the resulting state list for goal states. The traversal function will have the type bft :: Tree a -> [a]. It’s a bit trickier, but the technique here is to get a second function (here called bft_) to handle a list of trees at a time, then the main function calls it with the original tree in a singleton list.

 bft :: Tree a -> [a]
 bft t = bft_ [t]
 bft_ :: [Tree a] -> [a]
 bft_ trees = [ x | Node x _ <- trees ]
  ++ bft_ (concat $ map subForest trees)

You can think of bft_ as taking the next list of trees, skimming off the root values of the trees, and adding on the result of traversing the various child trees. The child traversal works by taking all of the children, flattening the lists of children to a flat list, and then running bft_ on this new list. If it helps, draw the picture and check how the code mirrors an informal level-by-level sweep.

The last step is to identify when we have found “goal” states—states that solve the required problem. We can do this by passing in a predicate of type a -> Bool and using it when filtering the output of bft. Putting this all together, we get the following.

 breadth_first_search :: (s -> Bool) -- goal test
  -> (s -> [s]) -- state generator
  -> s -- initial state
  -> [s] -- solutions out
 breadth_first_search is_goal children_of
  = filter is_goal
  . bft
  . state_tree children_of

To recap: we build the state tree, then do a breadth-first traversal of it to get a list of states to try, then return the ones that correspond to solved goals. Now compare this to implementations in your favorite language. More modular, no? And more reusable?

Exercise for the reader: write the code to do depth-first traversal, then generalize the search function to allow the search strategy to be supplied as a new argument.

Using the Search

Let’s plug the word chain code into the above and see what happens, e.g., defining the following and running word_chain "lead" "gold". You should see the word “gold” popping out every few seconds, which corresponds to each successful path being found. (There might be a short delay while the dictionary is loaded into memory on the first run.)

 word_chain a b = breadth_first_search (\w -> w == b) one_step a

This confirms that paths exist, but we want to know the intermediate steps. The way to do this is to change the state type from a String to something that keeps track of where it’s been, i.e., a list of previous words. The following code lightly wraps up the basic code to maintain this list, testing against the head word in the list, and creating child states by expanding the head word and pre-pending the new child words to the previous path.

 type Path = [String]
 goal_path :: String -> Path -> Bool
 goal_path target ws = target == head ws
 child_paths :: Path -> [Path]
 child_paths (w:ws)
  = [ s : w : ws | s <- one_step w, s `notElem` ws ]
 init_path :: String -> Path
 init_path w = [w]
 word_chain_path a b
  = breadth_first_search (goal_path b) child_paths
  $ init_path a

Running word_chain_path "ruby" "code" gets us a list of results, each a valid path, and the paths increase in length as the search enters deeper levels.

Note that it’s fairly usual to wrap such code up in a better abstraction. For example, the combination of goal test, state generator, and initial state do effectively define a search problem so it seems sensible to hold them in a record. It’s easy to adjust the search code to expect this as input. From this, we can recast the move to paths as converting a simple search problem object into a more complex one, e.g., convert_to_path_search :: SearchProblem s -> SearchProblem [s]. We could also encapsulate the path inside a new type so it’s not just a bare list any more.

Performance and Optimization

Is this code fast enough? Using the Hugs interpreter, it takes around a second or two to do the lead-gold example (both directions), and a few seconds more for the ruby-code example (also both directions). With the GHC compiler, this reduces to a fraction of a second—even without turning on the optimization options of the compiler. It’s probably fast enough.

GHC provides some good profiling tools to help diagnose and track performance issues. The following snippet is part of the standard time and space profile produced by running the ruby-to-code search 1000 times (taking around 10 seconds, so 0.01 seconds per search). This is without any compiler optimizations.

The profiler result indicates that 78% of the time was spent looking up words, and 14% of the time calculating the candidates. For memory, 84% of the total allocation was spent on the generated words. A similar pattern is seen when searching the reverse direction, albeit some 20 times slower(!)— though this could be a quirk of these words (we’d have to see this pattern on a bigger test set before concluding it was really a code problem).

  CALLS time space TIME SPACE
 state_tree 655000 0.8 4.3 95.9 93.7
  child_paths 58000 0.5 3.1 95.2 89.4
  one_step 0 2.3 1.7 94.6 86.3
  is_valid_word 5703000 77.9 0.0 77.9 0.0
  generate_candidates 286000 14.4 84.6 14.4 84.6

Now that we know this, if time or memory really were an issue then we could start analyzing the offending operations, maybe even fusing them together in some way to reduce overheads, or adding in cheaper word validity tests (like: reject words with no vowels). Until then, we can stick with the clear code.

Summing Up

We’ve not used many complex types, but types have still been useful in the process to describe what data we’re manipulating, and as a guide for how certain operations should behave. There’s room for a bit more type use in this Haskell-98 code, maybe wrapping up search problems as bundles, or perhaps wrapping valid words inside a new type to distinguish them from unchecked words.

What about dependent types? What details are we worried about, that we’d like the extra confidence for? I’ll let you think about this, but you might like to think about confirming that all candidate words are the same length, or conditions on the state generation to ensure that the states don’t inadvertently cause infinite loops or repeat work, or that the state space is finite (which would confirm that a tree-based approach is adequate rather than needing the full graph-based approach).

I’ve not said much about testing either. We can certainly add tests to double-check that code is doing what we expect, but we’ve also managed to split various concerns apart so it’s clear how each part works—indeed reflecting Hoare’s comment on “obviously no deficiencies” vs “no obvious deficiencies.”

When we code like this, it’s less of a “black box” situation—in the sense that testing does not say much about the code itself, just about its inputs and outputs, so it doesn’t really address code quality. So, what extra tests do we need?

Finally, I hope you have some appreciation of how we can use the language to keep the details simple and modular, and what a difference it makes to the final program. This isn’t just a happy coincidence—many functional programs do develop like this, and it really does help us work with much more complex programs. Even if you decide not to try Haskell, still be aware when you are programming of what is possible in the language, how to articulate your solutions, and think about the wider picture. Oh yeah, and keep practicing and having fun!

A Few Comments on Haskell Resources

We’re getting into more technical detail with Haskell here, but I’m not planning to write a tutorial on Haskell or repeat much of the good material already available. How do I expect you to fill in the gaps? Here are my suggestions and thoughts on relevant references:

  • My favorite introductory book is Graham Hutton’s Programming in Haskell. It’s a clear and crisp overview of Haskell-98 in around 180 pages, together with some significant examples (such as programming and optimizing the Countdown arithmetic problem). In my view, it is the K&R of Haskell—all the key details with little unnecessary filler. Experienced programmers from other languages should definitely take a look at it. (You can also find online videos of lectures by Erik Meijer—of LINQ fame amongst other things— based on Hutton’s book.)

  • A useful backup reference is A Gentle Introduction to Haskell by Hudak, Peterson, and Fasel. It’s available in a variety of formats online, and is a readable companion to the full Haskell Report. The language report itself defines the full technical details of the language, should you need it.

  • Real World Haskell by O’Sullivan, Stewart, and Goerzen from 2008 is a good overview of the language and some practical libraries, together with several significant programming examples. The content is available free online but also published in book form by O’Reilly. This book is notable for the way it mobilized the Haskell community to help: people were invited to comment on working drafts, via a web site allowing comments at the paragraph level, and the final book took many of these comments into account.

    The book is pretty comprehensive in its coverage and contains many good programming examples, so is definitely worth a read when you are ready, but beware that some libraries have developed significantly since 2008 so many better alternatives exist now. A particular case is programming for web applications: there are now several strong Haskell-based frameworks such as Yesod, Snap, and HAppStack.

My main criticism is that it tries to be too many things to too many people, and doesn’t always pull this off. (It would be interesting if the next edition could separate the original book into two or three smaller books. This could make library updates easier to manage too.) For example, there’s around 100 pages of introductory material before the real programming starts—not the most exciting start, and Hutton’s 2007 book already covers most of this. Plus (for me), it doesn’t really say enough about the pragmatic aspects and wider picture.

  • Some people may like Learn You a Haskell. This book (with a free Html version online) has been likened in style to Why the Lucky Stiff's Ruby Guide. Some of the explanations are good, especially for a different perspective from the standard, though the author’s approach can be wearing after a while.

  • Finally, the main Haskell web site has many links to other useful resources.

So, many good references—but as I mentioned in an earlier article, I think there’s still a gap in the market for a more pragmatic approach, hence my reason for attempting something in these articles. I’d be very grateful for suggestions of what else to cover, and for any feedback, so please join in the conversation in the PragPub forums!

Dr. Paul Callaghan rides a big, fast motorbike, but his motorbike has no chain—it has a drive shaft. So, no messy cleaning, oiling, weekly checks, regular maintenance, and no worn sprockets. Neither will his chain fall off. He prefers his programming languages to be similarly hassle-free.

Bits of his bio can be seen on earlier articles. Paul also flies big traction kites and can often be seen being dragged around inelegantly on the beaches of North-east England, much to the amusement of his kids. He blogs at

Send the author your feedback or discuss the article in the magazine forum.