Pretty image
In which we begin an exploration into the Haskell language and dive deeply into functional programming.

Ever wondered how functional programmers think? I aim to give you a glimpse into the programming style and mindset of experienced functional programmers, so you can see why we are so passionate about what we do. We'll also discuss some wider ideas about programming, such as making our languages fit the problem and not the other way round, and how this affects language design.

Few of these ideas get the exposure they deserve in textbooks or tutorials, and in my view they are essential for coming to grips with a functional language and using it productively in real apps.

Syntax and semantics, the meat and veg of most books and university courses, are ok for basic language use, but to really master a language that embodies a paradigm that is new to you, you need to know about the deeper pragmatic ideas. Let's see if we can do something about that.

I used Lisp for a few years before university, then switched to Haskell and have been using it for around 20 years. However, inspired by learning about Rails and Ruby when revamping a tired web technology course, I changed career to do full-time Rails work, and have spent the last four years having fun on a variety of apps, including Spree (#2 committer at one point) and recently a big bespoke lab management system.

Ruby feels like naughty fun for a Haskell programmer. Many of the ideas are very similar, like the very natural use of blocks and lambdas and having lots of scope for bending the rules. I really enjoy programming in Ruby, though some times I do get homesick and pine for a bit more oomph.

Most of this article will refer to Haskell, though many of the ideas do apply to other, similar languages as well. Haskell has a few advantages and a good balance of features. Haskell has its weaknesses too, and I hope to explore these in due course.

The Golden Rule

The most important idea in modern functional programming is this:

It's all about data

That is, functional programming is all about putting data first. We think about what kinds of data we have in a problem domain, and what kinds of transformations we want on them. Then we start building up the data structures and the code to do the transformations.

Functional programming is not about pure functions any more (or we'd still be using lambda calculus). What modern functional languages are about is developing ever-better tools to help in this style of programming, thus providing easier ways to specify powerful data types, manipulate them at a high level, break them apart and recombine them, and to do all this with a degree of safety and reusability and without too much syntactic baggage or implementation detail cluttering up the code.

Remembering to put data first is key if you want to use these tools well. Taking a fully imperative approach in a functional language doesn't end happily. The tools aren't designed to work that way.

A First Exercise

Ever used pipes in Unix shells? They are a very good introduction to functional thinking, and it's worth lingering a moment in Unixland before jumping into Haskell.

For example, suppose you want the three most memory-hungry processes owned by users whose login names begin with "foo"? There's no magic Unix command for this, but you can easily build one by assembling smaller pieces. This is a key idea in Unix, which provides small tools to perform steps on data, and a way to glue the steps together. I'll leave the fine details of this example to you, but basically you use ps to list processes, use grep to select matching rows, then sort on a particular column, and finally use head to grab the first few rows.

Another example: how do you count how many processes run by users Fred or Joe use more than 100M in virtual memory? You might need wc to count rows, and awk (or even perl or ruby) to do the numeric filtering. The details aren't important.

What is important is to notice what data we're using (rows of lines, each with certain fields) and how we're transforming that data through a pipeline of steps, one simple step at a time. Look at the code too--it tends to say pretty clearly what it is doing.

A Second Exercise

Let's see how much Haskell you can understand just by first grasping an informal, hand-wavy solution and then looking at the corresponding code. It's very much a "think in data" solution, and quite different from how you've probably seen this kind of problem approached before. Without modulo division or if statements, I give you "Functional Fizz Buzz".

The task is this: you want Fizz every three steps, and Buzz every five steps. Note that sometimes the cycles coincide. Let's talk cycles then.

 threes = cycle ["", "", "Fizz"]
 fives = cycle ["", "", "", "", "Buzz"]

where cycle is defined like this (the real library def is more efficient, but less clear)

 cycle xs = xs ++ cycle xs -- SIMPLE version of lib

So threes just spits out ["","","Fizz","","","Fizz",... ] until we stop it, and similarly for fives. Next, we want to merge two streams into one: this is quite common so there's a library function for it. zipWith pairs up elements and uses some operation to combine each pair:

 zipWith g [a,b,c,...] [d,e,f, ...] ===>
  (computes to) [g a d, g b e, g c f, ...]
 eg zipWith max [1,2,3] [2,2,2] ===> [2,2,3]
 eg zipWith (*) [1,2,3] [2,2,2] ===> [2,4,6]

Think zippers in clothes. Now, that's just what we want for merging our streams. It works for infinite streams too (why shouldn't it?)

 fizzbuzz = zipWith (++) threes fives

(++) is string concatenation, and then we just push the list of lines to the screen. And hit ^C when we get bored.

 main = putStr (unlines fizzbuzz)

If we want numbers in there between the fizzes and buzzes instead of blanks, we can just zip in another list that contains the numbers from 1 up to infinity, and just add in a number if the string would otherwise be empty.

So: it's a short piece of code which obviously works, built from small pieces and glued together in a simple way (think Unix pipes), and there's no worry about loops, division, variables, memory limits...

Now, this isn't a one-off trick. Programming with Haskell is like this most of the time.

Data Types Are Cheap

We'll now look at the tools that modern functional languages provide. The first one is the sheer flexibility for declaring new data types and the scope for reusing them. This is very important: rather than encoding some data in a hash and hoping that you remember to keep the encoding consistent, Haskell allows you to add a new type that expresses exactly what you want to store. Such declarations are much shorter than the equivalent in imperative or OO languages. And Haskell can automatically generate certain standard functions to use with the type, such as ordering tests or conversion to strings. I'll say it again--it's very quick and easy to add the data type that fits your problem domain well, and the close fit really helps code quality.

Some examples now.

 data Bool = False | True

That's right--Bool isn't baked into the language, the language is powerful enough to add such "primitive" notions directly into the core language. The various boolean operators, including shortcut semantics, are just standard Haskell definitions. The "core" of Haskell is a surprisingly small language, and the rest of the standard language is defined in straightforward Haskell.

 data RGB = Red | Blue | Green deriving (Eq, Ord, Show, Enum)

This defines three constants (Red, Blue, Green) and automatically generates equality and ordering tests, a show (ie. to_s) function and the ability to use .. notation; e.g., [Red .. Green] is the list [Red, Blue, Green]

 data Maybe a = Nothing | Just a

This is a (parametric) polymorphic type, and represents a box which is either empty (Nothing) or contains a single value of some type. This has various uses; e.g., passing in an optional value or returning a value that is absent or a sensible value. The parametric polymorphism means it can be used with any value (and any type) we choose, so it's not limited, say, to just containing strings. Note that parametric polymorphism is not the polymorphism seen in OO (though Haskell has a version of the latter, as explained below). A quick explanation is: parametric polymorphism means using the same code for everything, whereas the main kind of polymorphism in OO is more about allowing different values to be treated the same by virtue of calling object-specific code.

 data Person = Person { name :: String, age :: Maybe Int,
  fav_col :: RGB, address :: [String] }

Here's a simple record that stores a name, an optional age, a favourite color, and zero or more lines of an address (square brackets means lists in Haskell). Notice that a record field can contain an arbitrarily complex value, so not just primitive types. An example value is

 joe = Person "Joe" (Just 25) Red ["Durham Cathedral", "Durham"]

Haskell also has syntactic sugar for accessing and updating record-style values which uses the names provided for the fields, in effect giving us accessors and setter functions.

We can also have recursive types like lists or trees, like the (polymorphic) binary tree below, which has values of some type at the leaves, and its internal nodes each have a left tree and a right tree.

 data PTree a = PLeaf a | PNode (PTree a) (PTree a)

Lists are defined in a similar way, with a [] case and a cons case. Again, this isn't baked in, apart from a bit of syntactic sugar that Haskell provides to allow a simpler notation. Haskell also allows more complex examples, such as the following, which is effectively parametrizing which type holds the subtrees. This allows us to vary how the child trees are stored; e.g., we could have zero or more (c = lists), or precisely one for each of the RGB colors (with c a as a function from RGB to some polymorphic a). Not an everyday construct, but it does have its uses!

 data X c a = XLeaf a | XNode (c (X c a))”

We're able to combine constants, records, recursion, and polymorphism quite freely, and to mix these with types in the standard libraries, like arrays, hashes, lists.... This gives us a lot of flexibility and convenience to model data in our problem domains, and to do it without much code. This modelling can be very accurate too, which helps to eliminate certain classes of errors. That is, if we can use these data type definitions to precisely say what is allowed, then our code need only deal with those cases, and we can easily check coverage of those cases. For example, representing optional values with Maybe forces us to explicitly handle the "nothing" case vs. the "something" case. Compare this to Ruby, where nils are often used for this but it's quite common to forget to check for nils before calling a method.

In fact, Haskell does not have nil values (and does not need them), so that's one class of error we never see.

Don't underestimate how important this flexibility and accuracy is!

A Very Important (and Under-Appreciated) Type

Like most functional languages, Haskell has first-class functions, meaning we can treat functions like almost any other piece of data--build them, pass them around, use them. We should not forget how such functions sit with the data types above. The notation for the type is A -> B, indicating a conversion from some type A to some other type B. Tests on color values will have type RGB -> Bool, or determining the max of two colours with RGB -> RGB -> RGB. Values of these types can appear in record fields, etc., for (a very contrived) example, each person could include a mapping from an int to a color which expresses what colour that person associates with a number.

We can also represent "waiting for an X" as a function value; e.g., if you have a person record but are waiting for their address to come from somewhere, this could be represented as a function of type [String] -> Person which is supplied with the address when it is available, and will return the complete person thereafter. Using the example above, we can do it like this, using Haskell's syntax for anonymous functions:

 \address -> Person "Joe" (Just 25) Red address

Pattern Matching

Doing stuff with values in the above types is easy: we just write clauses in our functions to deal with the various patterns we expect to see.

Example, mirror-reversing a tree. There are two main cases, leaf or node, and each clause says what to do with the contents of the tree.

 mirror (PLeaf x) = PLeaf x
 mirror (PNode l r) = PNode (mirror r) (mirror l)

Notice that we're covering all possible cases of tree here. A value which is of tree type is either a leaf or a node, and we provide code to handle both cases in full. We'll never get a run-time error when an unexpected input is received. Some of the compilers track this "totality" for us, and can give warnings when functions don't cover all cases. Also, it doesn't matter what kind of data is on the leaves--this operation is just manipulating the tree structure. So quite naturally, this function is (parametrically) polymorphic and can be used on any PTree value.

Also note, we don't need if statements so much now. Pattern matching does most of the checking work for us. We can still have boolean-conditional tests; e.g., if 2 > 3 then a else b, or there's a shorthand for combining these with the patterns above. Finally, we can arbitrarily nest patterns for more complex conditions; e.g., the following from a compiler for Java, which tests for lifting a side effect out of a binary operator expression.

 rwE (BINOP op l (ESEQ s r))
  | commutes s l
  = chg $ ESEQ s (BINOP op l r)
  w otherwise
  = chg $ ESEQ (MOVE (TEMP t) l)
  (ESEQ s (BINOP op (TEMP t) r))
  where t = new_tmp_var

Recursion, Control, and Higher Order Functions

Pattern matching above gives us decision making, and we can also make arbitrary calls to other functions (including recursive calls). What about loops and so on? Most of the time, we don't really need them.

Think data and transformations again. When we have a collection of things to process in some way, say a list of numbers that we want to turn into a list of colors, or a tree containing strings for which we want the leaf count, most of the transformations we want fall into two main patterns: mapping and folding.

Mapping is about applying the same operation to everything in a collection, but keeping the shape the same. For example, we can add two onto elements in the list [1,2,3] to get [3,4,5]. The order and count of elements stays the same. The operation to perform is supplied as a function. Ruby programmers will know this pattern already, and know how convenient it is for replacing an explicit loop.

The other and more powerful pattern is folding. Ruby programmers will know this for arrays as inject; e.g.,

 [1,2,3].inject(1, &:*)

to get the product of a list of numbers, and it works exactly the same. Another way to think of folding is to replace the "nodes" of some data structure with functions (or constants). Writing [1,2,3] in lisp style (cons 1 (cons 2 (cons 3 nil))) then [1,2,3].inject(i,f) will give us (f 1 (f 2 (f 3 i))). In the product example, this is (* 1 (* 2 (* 3 1))). Notice that we're collapsing or folding the data into a different type (list of numbers into a single number), though with appropriate choice of f and i we can produce a list of numbers again, or even produce more complex values--like lists of trees of numbers.

Now, this folding idea, of replacing constructor nodes with functions, applies to any data structure, so we can easily adapt it to other types, like trees or records. Here's folding on the simple trees from above.

 foldPTree node_case leaf_case (PLeaf x)
  = leaf_case x
 foldPTree node_case leaf_case (PNode l r)
  = node_case (foldPTree node_case leaf_case l)
  (foldPTree node_case leaf_case r)

So a leaf count can be done with foldPTree (\x y -> x + y) (\_ -> 1); i.e., count one for each leaf value then add up the results from subtrees. Once you understand the folding pattern, then such definitions suddenly become a lot clearer than the explicit version. Compare with this.

 leaf_count (PLeaf x) = 1
 leaf_count (PNode l r) = leaf_count l + leaf_count r

The code is simple enough, but you still need to check each line carefully to see that it has all the details right and doesn't contain mistakes, like calling leaf_count l instead of leaf_count r. In a sense, mentally you have to rewrite it as a fold! It's similar to explicit loops in Ruby, when you realize that they can be written more directly as a map or an inject. Isn't it nice to shorten and simplify the code, to make it say what you mean more directly? Yes.

So these maps and folds are a natural way to transform your data, and thus are highly useful tools for programming--in terms of both saying simply what you mean and avoiding verbose code. In fact, explicit recursion in FP is a bit of an anti-pattern, particularly when it's not needed--it can be annoying to read extra code when a more direct version works, plus more code means more chance of slipping up. Tony Hoare contrasted code with obviously no deficiencies vs. code with no obvious deficiencies--it helps to aim for the former!

Now, you might not be able to spot the maps and folds in some piece of code right away--it takes practice--but when you have sketched out an explicit version, do go back and think about whether it has bits of mapping, bits of folding, or particular sub-cases like filtering, and see if you can simplify the code. Also think in terms of data and transformations, and the opportunities might become more obvious. Don't feel compelled to use folds, etc., if you're not confident, but do try to reflect on your code afterwards and see if you can refactor. There will be cases when your pattern of recursion does not fit a fold, or will look worse if coded as a fold, but these are pretty rare. Similar advice applies for loops in other languages too: look for the mapping aspects, filtering, folding, and try to use the higher-level operations instead. Again, you should find that explicit loops aren't required that often (and if they are, perhaps wider refactoring would help; i.e., is there a bad design choice upstream that is forcing your hand?)

Three last technical points. First, folding corresponds to the "vanilla" pattern of processing values in a data type, and this pattern is inherently connected to how the datatype is defined. It's not a coincidence. Second, we're passing functions into the maps and folds to control what happens in the various cases, hence maps and folds are examples of Higher Order Functions. We don't need to dwell on this much--it just means functions that do stuff with functions. Third, mapping can be defined in terms of folding (exercise: try to write map in Ruby in terms of inject).

Functional Glue

The other side of higher order function use is how we use the building blocks to create larger blocks. Haskell is sometimes called an excellent "glue" language, because of the ease with which code units can be assembled. You've already seen pipelining--building large transformations from a sequence of smaller transformations, each contributing some piece towards the final result. Here's a Haskell example, written in various styles.

 foo1 input = unlines (map (\line ->
  unwords (map reverse (words line))) (lines input))
 foo2 input = unlines $ map (\line ->
  unwords $ map reverse $ words line) $ lines input
 foo3 = unlines . map (unwords . map reverse . words) . lines

The above reverses each word on each line of the input. Conceptually, we split the input into lines, then each line into words and then reverse each word, then reassemble the words and lines. The first version is the explicit version, the second cuts down some of the parentheses by using a syntactic trick, and the third shows the idiomatic version. Clearly, the first two still have some noise, but the third says very concisely what we are thinking. Reading right to left, we split the lines, do something to each line, and reassemble. And for each line, we split to words, reverse them, then reassemble. Again, no trickery--just using the language to say what we mean.

The dot deserves special mention. It is functional composition in Haskell, and basically means joining the output of one function to the input of another, so forming a bigger function. It is defined like this, as a piece of Haskell, and not baked in either:

 f . g = \x -> f (g x)

So, foo . bar is a new function that takes some value x, applies g to it then applies f to the result. The two functions can be anything (providing their input/output types are compatible, of course). For example, show . (\x -> x + 3) shows the result of adding three to a value (assumed numeric). Or, not . (\x -> x > 10) . (\x -> x + 3) tests for a number + 3 being more than 10, and inverts the result. We're not stuck with this definition of composition either--we can define and use "reverse composition" too, where f gets applied first then g--whatever we find most useful.

Here's an example that often annoys me in Ruby. Sometimes I want to do one mapping on a list, then another mapping, so I have to write

 stuff.map(&:foo1).map(&:foo2)

In Haskell it looks like this: map foo2 $ map foo1 $ stuff. But often, conceptually it is nicer to have one map doing two operations, so can rewrite it to map (foo2 . foo1) $ stuff. Ruby doesn't support this kind of rewrite without a lot of extra syntax. (I suggest this flexibility is the acid test of full functional support in a language.)

Finally, if we need other kinds of glue, we can just define it ourselves.

Clean Syntax

Haskell's syntax is one of its key features, and a model for other languages to follow. Several aspects are deliberately designed to cut down on syntactic baggage and allow more focus on the ideas in the code. This really emphasizes the declarative style of programming. Some key points:

  • Function application is kind of lisp style, but without so many parentheses; e.g., map reverse (words "foo bar"), where map takes two arguments and words takes one.

  • Partial application is natural and unobtrusive; e.g., map foo4 is a function that applies some function foo4 to any list it is supplied later. Compare other languages, where we'd have to write something like ->(list) {map foo4 list} to get the same effect.

  • Modern precedence and associativity rules for operators, which can be extended to include new user-defined operators--which really helps in the design of embedded domain-specific languages (DSLs)--although one can go too far and end up with APL...

  • Semicolons. Yes, semicolons. And braces (curly brackets). The Haskell grammar actually requires these elements to delimit and separate groups of expressions, and most implementations of the language have a pre-processing stage that inserts braces and semicolons according to how indentation is being used in the code, typically through a variant of Landin's "off-side rule." (Key point: Haskell's use of layout is more flexible than Ruby's.)

Performance

You may have been worried by the seemingly infinite loop in the fizz-buzz example. However, Haskell compilers generate "lazy" code, which means (as a first approximation) that work is only done when needed, so we only generate as much of the list as we need (here, until ^C is used), plus garbage collection reclaims unused space.

Note that there are several kinds of "lazy" available, including a few options we can set during compilation, and that the Haskell language definition only requires "non-strict" rather than "lazy".

Put another way, the language passes more control over execution order to the compiler and runtime. We don't have to micro-manage the steps of computation, and the compiler is free to transform our code in powerful ways to improve performance. It's quite common for FP compilers to optimize away some of the intermediate data structures in a chain of functions (called "deforestation") and so save on memory and time. Some FP compilers (Ocaml in particular) can even match the top C compilers in performance now because of the complex optimizations they can do when side-effects are less of an issue.

The key point from this is the silver rule: "Trust your compiler." Your job is to write good code, and you can let the compiler deal with some of the lower-level details. It is much better at complex optimizations than you are--and less error-prone(!) If the program runs slowly, there are a good range of tools for profiling and analyzing that help to pinpoint bottlenecks. You can then work on optimizing the crunch points. Haskell's GHC compiler offers several simple techniques for speeding up bottlenecks, from reduction of laziness, through user-supplied rewrite rules, to even making it easy to code key steps in C. Meanwhile, most of your code can remain high-level and clear.

Summing Up

There is a lot more I would bring up, but my editor is tapping his foot. Next time? For now, I hope you have a clearer view of how functional programmers think: that most of it is about deciding which data structures you need, then coding up various transformations. And how we use the language to program at a high level and use it to express the big ideas in a straightforward and clear way. It is kind of addictive, just like Ruby. You have a lot of flexibility and convenience, but you also get strong tools to do even more complex stuff and keep control of it.

Should you all go and learn Haskell? I would be chuffed if you were intrigued enough now to spend a few hours on it, and manage to absorb some of the different ideas. I think the main benefit is to be aware of the data structures that your program is manipulating, and aim to write your code to bring this more to the fore; then you will see more functional ideas appearing quite naturally in your code. It also helps to think about the strengths and weaknesses of your existing tools, to use their strengths well and to avoid getting caught by their weaknesses.

This goes for Haskell too. It has strengths, and it has weaknesses, and researchers are already looking at stronger languages.

I said before that Ruby feels like naughty fun for a Haskell programmer. This is partly due to a lot of common ground between the two. So I really hope that Ruby programmers get something useful out of these articles which feeds back into their Ruby or Rails work. It is possible that you hit obstacles and get frustrated, but do try to step back, think of the bigger picture ("data, dude!") and try a different approach. To this end, I would like to do a kind of code clinic in a future article, and use the PragPub forums to get people past some of the obstacles. It'd be great to hear your views.

In the coming months, I aim to cover (provisionally)

  • understanding the maybe type

  • practical tips for working in haskell (including type-directed development)

  • tests, types, and learning from both

  • haskell code clinic

  • writing a ruby refactorer in haskell

  • dependent types

  • and yes, yet another explanation of monads

I need a picture and a bio.

Dr Paul Callaghan rides a big, fast motorbike, and suggests that this informs his programming language opinions. He doesn't like to wait around, or do more stuff than he has to, and loves the freedom. He started with Ruby and Rails around 6 years ago and soon decided to take up Rails work full time, and is currently working on a range of bespoke systems for a small software house. Before this, he was a university lecturer and did research on advanced programming techniques using dependent types. He did his PhD in the area of Natural Language Processing as part of a group that competed in several international information extraction competitions, developing significant chunks of the NLP system's 60k lines of Haskell. He blogs here and also flies big traction kites and can often be seen being dragged around inelegantly on the beaches of North-east England, much to the amusement of his kids.

Send the authors your feedback or discuss the article in the magazine forum.