small medium large xlarge

Dependent Types, Part II

Tests and Proofs

by Paul Callaghan

Generic image illustrating the article
  Paul continues last month's discussion of dependent types.  

This month we continue looking at some core ideas in dependent types and the associated theory and technology, as we translate the Haskell word-wrap code from Issue 45, start to consider relevant tests and proofs, and discuss the details arising.

As we get deeper into the more powerful language features, it’s worth restating some of the key points of the style of programming advocated here. It should help to place the various new ideas in context.

First, strategic and deliberate use of types to help write better code. This includes a stronger description of what some code "does" with its inputs and outputs, and actively using type information to guide development—even using type information to fill in bits of the program for us, or better still, allow us to omit some of the routine details. (I’ve already highlighted some similarities between this way of using types and current ideas on TDD; see Issue 39. I’ve yet to see anyone use the information contained in test cases to automatically generate bits of code though!)

Another aspect is the willingness to introduce new types to help clarify code and ensure no details are left unconsidered. It’s important that new types can be added without much ceremony, and that existing types can be combined in various new ways and used consistently. Notice that this is almost diametrically opposed to ‘Stringly-typed’ programming and its close relatives!

Secondly, careful use of the language to highlight the ‘why’ of something working. For me, a program should tell a story and be clear about how it works. Although I appreciate the practical and economic reasons for getting the functionality in place before the aesthetics, it does make me slightly uncomfortable. One reason is that programs are not usually written to obey a collection of observations: there’s an underlying need to be met or problem to be solved, independent of the technology in use; an idea or goal that we can talk about and understand. I like to see that higher level thinking reflected in programs (and I like my technology to help reduce the gap between brain and program, as I tried to explain elsewhere).

In contrast, test cases are mainly a set of observations that we expect the program to obey, and such a set of observations is not the same as a coherent model. Compare this to how science progresses, from observations to models and laws which are then assessed against previous observations and against the predictions they make for new situations. We may hope that the test cases approximate some kind of model, but it’s still an approximation and doesn’t have the same strength as a model. For example, how are we sure that the test cases are consistent, that we’re not trying to get the program to do contradictory things?

Or sure that there are no important gaps? We could try further testing, but I suspect that most of you would start to check your tests against some kind of model in your minds of what the program should do. So even if you’re using tests, you still probably do have some model! Why not try to use that model more directly then, and get some leverage from it? What’s stopping you?

I don’t claim that this position is 100-percent perfect and all of the details have been worked out, and it’s unlikely that any language or paradigm will reach 100 percent (and that includes ‘graduate student’). But on reflection, I’d rather have a great language for 80 percent of my code with warts of varying (but manageable) complexity for the remainder, rather than an ‘okay’ language for 90 percent.

Translating the Code to Idris

As a first step, before we start playing with more advanced ideas, let’s see if we can get the word-wrap code running in Idris. Here’s the small test suite. Only minor changes are needed here, since Idris is pretty Haskell-like—and the changes are just in the type signature.

(Note: I’m using Idris version 0.9.7 for this article, same as for last month’s, though there have been a few useful changes to Idris since that article appeared. Let me know on the magazine forum if you have problems though.)

 tests : (Nat -> String -> String) -> List Bool
 tests wrap = [ wrap 1 "" == ""
  , wrap 1 "x" == "x"
  , wrap 1 "xx" == "x\nx"
  , wrap 1 "xxx" == "x\nx\nx"
  , wrap 1 "x x" == "x\nx"
  , wrap 2 "x x" == "x\nx"
  , wrap 3 "x xxxx" == "x\nxxx\nx"

At present, the type signature is not optional in Idris. This point is worth a few comments. Remember that type signatures are (mostly) optional in Haskell, and the exceptions occur when you’re using tricky combinations of type-class-based overloading? This is because most of the time, the Haskell implementation can infer the type automatically, though on occasion will need some hints on resolving the overloading. The same might be possible in Idris for relatively simple code. So why does the current implementation take a firmer line and require them everywhere?

Part of this is probably implementor convenience, though note that in this more type-orientated language we’ll be using type information in stronger ways so it’s mostly in our interest to play along and provide such type information. In comparison, Haskell’s type inference follows the traditional idea that you write some code and then see what the compiler makes of it. Computer says yes, or computer says no. But in our context, types are much more central to the activity of programming, much like the role of tests in TDD, and the system will use the additional information to guide the construction of the code.

Straying briefly into a theoretical distinction—but only to make sure I’ve properly got this point across—notice that type inference and type checking are not the same thing. The former takes some complete code and tries to work out its type, and it’s pretty much a bottom-up operation: we traverse the syntax tree and eventually return a type (or some error messages). Type checking could be implemented in a similar way, where the checking part compares an inferred type to the claimed type, but there are other options. One of these is to work top-down through the syntax tree, effectively propagating the constraints from the claimed type and onto the code itself.

Idris’ type checker is more in this style, particularly since it’s also used to ‘elaborate’ input syntax (i.e., raw code with implicit arguments and overloading) into the full, unambiguous internal abstract syntax. To sum up, Idris probably could drop the restriction in simple cases but it helps the machine (to help us) if we provide the type signature, and it helps us by early catching of errors.

Plus, type inference is starting to look a bit overrated and opinion is turning towards elaboration via type checking as the more powerful notion. Inference is a convenient feature to have, but it does place some annoying limits on various programming language features. For example, see how Scala and Haskell have to limit their implementation of ‘Generalized Algebraic Data Types’ (GADTs) to retain type inference. In contrast, languages like Idris—which use checking via elaboration—aren’t so constrained by type inference and so can support the full range of inductive families (which we will look at soon).

The switch of emphasis from inference to checking also mirrors what I mentioned above about stronger ways to use type information to guide program development.

Next change: instead of Int I’ve switched to Nat. We’ll be using Nat quite a bit so why not start now. The Nat type is defined like this.

 data Nat = O | S Nat

It means, a natural number is zero (that’s a capital O) or the successor (S) of a natural number. If you ever did number theory, you might recognize these as part of the Peano axioms.

Notice that it automatically excludes negative numbers, and right away we have a win because using it will allow us to completely ignore the possibility of wrap being passed a nonsensical length value. (There are other ways to do this too, usually involving propositions, but this is the most direct way.)

Now, don’t worry, we’re not going to be doing heavy-duty arithmetic with this version of numbers. Instead, this type is going to be used to represent and check constraints, and this will only involve small amounts of computation. Plus: quite often these constraints will completely disappear in the compilation process and thus have little or no run-time cost!

The recursive nature of Nat is also important in related proofs, and we’ll be using it a few times here so it’s important to understand the details. Do you remember the proof by induction rule from school math? The statement of it, i.e., the proposition it represents, is this:

 nat_ind : (P: Nat -> Type) -> P O ->
  ((x:Nat) -> P x -> P (S x)) -> (n:Nat) -> P n

So, given some property P, a proof that P holds for zero, and a proof that P x -> P (S x) for all x, then we can prove P holds for all non-negative numbers. Notice that there’s no upper limit to the available numbers, unlike Int etc., so when we say ‘for all’ we really do mean it. The proof of this rule is a simple recursive program, shown below. Notice how it works, e.g., when given a concrete number—say S (S (S O))—then its output is succ_case _ (succ_case _ (succ_case _ zero_case)),namely, it constructs a proof that P holds for that exact value. Curry-Howard in action again!

 nat_ind P zero_case _ O = zero_case
 nat_ind P zero_case succ_case (S x)
  = succ_case x (nat_ind P zero_case succ_case x)

This idea of proof by induction can be used with other types too. For example, the zero case for lists is for the empty list, and the successor (or step) case maps to consing an element onto a list. Induction on trees works in a similar way, e.g., simple cases for leaves and more complex cases for branching nodes. You may have realized that this pattern of induction is quite close to what you know about folding on such types! It is.

Finally, I use List Bool instead of [Bool] to avoid issues with the overloading of [ ... ] notation for lists and vectors. There’s other overloading afoot here too—Idris supports automatic casting from conventional numerals (1,2,3...) to equivalent Nat values so we don’t have to write S (S (S O)) for 3 etc.

Aside: Constructive Mathematics

I spoke above of uses of nat_ind constructing a proof for some proposition for some concrete value. This idea of constructing proofs from direct evidence is a powerful idea in mathematics, and has been studied for many years. See wikipedia for an introduction and for more detail see Constructive mathematics from Stanford’s encyclopedia of philosophy.

Constructive mathematics is distinguished from ‘classical mathematics,’ and a key difference is the view taken of the logical proposition A or not A, known as the Law of Excluded Middle (with the key section for us here.

So, do you accept that for some proposition P, e.g,. ‘the sky is blue,’ that it’s true that P is true or not-P is true? I.e, that ‘the sky is blue or the sky is not blue’ is true? Constructivists basically do not accept this as a rule of reasoning, and so do not accept any reasoning or proof that relies on it.

If you’re used to Boolean logic, then pause a while and think about this issue. Excluded middle does hold there (check the truth table, for example). But, Boolean logic is only one model of reality, and a simple one at that, and not always the one we need to use.

For added effect, choose some emotive statements—such as ‘bikes are cooler than cars’’and substitute them for P, and see what you feel about the result. (Also look to see how many times people try to apply this as a rule of logic in programming language discussions!)

Finally, take a look at Raymond Smullyan’s Drinker paradox. The proposition is ‘There is someone in the pub such that, if he is drinking, everyone in the pub is drinking, or more formally exists x . (D(x) larr forall y. D(y)) (read as ‘there’s an x such that: D(x) implies for all y then D(y)’).

Does this proposition make sense to you? Do you think it is true? It is an example of something provable in classical logic (by using excluded middle) but not true in constructive mathematics. Also note that this is the same kind of reasoning that allows classical mathematicians to conclude that some P holds for some X if they prove that not-P does not hold for all Y (informally: that something exists if it is not impossible...). In contrast, a constructivist proof would be a program that constructed something that exhibited the required properties, ie. tells you how to do it. Do you see the difference?

Translating the Word Splitting

The first attempt is below, adding some basic definitions that aren’t in the Idris basic library (or Prelude) at present, and adjusting the types.

 null : List a -> Bool
 null [] = True
 null (_::_) = False
 iterate : (a -> a) -> a -> List a
 iterate f a = a :: iterate f (f a)
 splitIntoSize : Nat -> List a -> List (List a)
 splitIntoSize n
  = map (take n)
  . takeWhile (not . null)
  . iterate (drop n)

But now we run into another important issue: totality. There are several aspects to this, but for now, the point is that we’re not sure that the above code terminates. Informally, termination means that it always produces an output in finite time for all inputs, and does not crash or go into an infinite loop.

Totality (and termination) is important in dependent type systems because it’s a key requirement for using code inside types. If the code crashes or loops, then the type checker might crash or loop. If we’re wanting to view types as propositions and programs as proofs, and wanting to rely on the results, then we obviously do not want crashes or infinite loops!

As a concrete example, say we change nat_ind from above to the following incorrect version, where the recursive call is applied to (S x) instead of x and the zero case is removed.

Happily, it won’t even type check (the result of the recursive call doesn’t match what is expected)—which is great, because it shows how the more detailed types already save us from such mistakes. But suppose it did pass type checking, then what would happen when the code was used? It would be an infinite loop, because the recursion would never terminate. If we repaired the argument on the recursive call, then the code would crash because there’s no clause to handle the zero case.

Clearly, we couldn’t trust any proof that used that code.

 nat_ind P zero_case succ_case (S x) =
  succ_case x (nat_ind P zero_case succ_case (S x))

See the section 8.5 of the Idris tutorial for more information about Idris’ totality checking. Notice the remarks on totality checking being ‘conservative,’ in the sense of maybe disallowing some fairly reasonable code rather than risk allowing non-total code to be passed. If some definition is rejected, we just need to be more explicit about why it is safe. (We’ll see some ways of doing this when we get to inductive families.)

I will be using the %default total global directive in the code for this article to require that all of the code passes the totality check. It’s probably wise to aim for most code to pass the totality checks because it is a good check on code being ‘sensible.’ Most of our code should terminate, exceptions being things like event loops or tricks with laziness, so it’s good to have this confirmed and thus remove one source of potential bugs. It’s also useful to know that our code handles all relevant cases and doesn’t miss anything, hence addressing another source of potential bugs.

Maybe Idris should even have %default total as the default mode? Not all code will be total, so Idris provides various options to accommodate exceptions, such as directives on individual functions. We’ll see some of these later.

One other detail to be aware of: the Idris compiler is designed to handle non-lazy code, so basically, some non-total definitions can cause the run-time stack to overflow. Be careful though: the current version of the REPL’s evaluation mechanism supports some laziness, so some examples that work in the REPL might crash when compiled.

You can avoid surprises by requiring totality by default in all definitions, and carefully choose whether you allow a tricky definition to be excepted from this. Non-total code can also crash due to unimplemented cases.

In the above code, the question mark of totality hangs over the iterate function. Clearly, it does not terminate—it is intended to generate a potentially infinite list. It’s kind of safe in a lazy context because we only generate as much as we need and tend not to let it run unchecked, e.g., we tend not to try to take the length of the list of results produced by iterate, but the risk is still there of doing this accidentally or indirectly. We can’t take that risk if proofs depend on it, hence we need something more. How do we isolate why it is ok in this situation, and convey that to the machine? What would make it safe?

One way is to recognize that the recursion in iterate isn’t the nice, well-behaved recursion we enjoy with lists, trees etc., and adjust how we approach the question of totality. That is, the reason for ‘why’ it is safe (in certain circumstances) is not the same. Technically, iterate uses the "dual" notion of vanilla recursion, which is often called co-recursion, and instead of generating simple data (i.e., finite), the potentially infinite value is described as ‘co-data.’

Idris has some experimental support for co-data, e.g., see Release 0.9.5.

But we’re not going to explore that avenue now! Instead, here’s another technique that sometimes comes in handy: we add some kind of measure to the definition so that the pattern of recursion turns into a well-behaved version. The measure acts like a counter that says how many times we can (or should) need to iterate; hence prevents us from unbounded recursion. The basic function is this, and it passes the totality checker.

 bounded_iterate : Nat -> (a -> a) -> a -> List a
 bounded_iterate (S n) f a = a :: bounded_iterate n f (f a)
 bounded_iterate Z _ a = []

For the case above, where (drop n) is the thing we want to apply repeatedly on a list (assuming n > 0), then one good measure is how many items are in the list. Since drop n (where n > 0) cuts at least one element off if the list is non-empty, then we’ll at most have as many steps as there are elements in the list. (You could improve this by using the list length divided by n plus 1 for any left-overs, if you want, but it doesn’t change the result.)

Let’s make the use of a measure more explicit, and revise the code. It passes the totality checks, and we can test it in the REPL with expressions like print $ splitIntoSize 4 [1..101]. (It won’t print directly in Idris 0.9.7 but will show you which calls to the FFI it should make....)

 hacky_iterate : (a -> Nat) -> (a -> a) -> a -> List a
 hacky_iterate measure f x = bounded_iterate (measure x) f x
 splitIntoSize : Nat -> List a -> List (List a)
 splitIntoSize n s
  = map (take n)
  $ takeWhile (not . null)
  $ hacky_iterate length (drop n)
  $ s

Now, what happens if we try a split size of 0? I’ll leave you to find out. But notice that there was nothing to prevent our doing this, and indeed, nothing to guarantee that our measure function (just length) was sufficient to produce a correct answer, with or without a sensible value for n. We need something else, and indeed have several sensible options to try. Do not for a minute assume that this is just how stuff is, and that we have to put up with such gaps!

A First Stab at Proofs

One way to get more confidence in the results is to write down a few test cases and check that the results are what we expect.

We could also adapt the QuickCheck framework to Idris. QuickCheck is a library for property-based testing, where you can state properties to test and the library runs the properties on many different inputs. This helps you to explore properties of your code, and also helps to find counter-examples for properties that don’t hold—which could be due to bugs in your code or properties that can’t be satisfied—or both.

With excellent timing, @jessitron has just published a good introduction to the key ideas and how they relate to example-based testing.

For more detail, see the relevant chapter of Real World Haskell and research papers on integrating QuickCheck into Agda (another dependently typed language and proof assistant).

Interestingly, some people like to use QuickCheck to ‘debug’ proofs as well, in the sense of working towards a good statement of some desired property by eliminating versions for which QuickCheck finds counter-examples, and they only start to do serious work to prove a theorem when they have reached a point where QuickCheck fails to falsify it.

Better still, if we want more certainty, we can try some proofs. There are many ways to approach this, and many properties we might want to explore. We’ve already proved one theorem of course, via the type checker, that splitIntoSize n turns a list of things into a list of lists of the same thing.

Here’s a simple property about length, that splitIntoSize n l yields a list of lists where the longest fragment (or sublist) is not longer than the limit n. More formally the proposition is, for all limits n and all lists l, then the maximum length of a fragment is less than or equal to n. I’ll postpone the details of how we represent and reason about inequalities, but here’s the statement of the proposition with some auxiliary definitions (maximum value in a list, then maximum length of sublists in a list).

 maxList : (Ord a) => a -> List a -> a
 maxList z = foldr max z
 maxLength : List (List a) -> Nat
 maxLength xs = maxList 0 ( List.length xs)
 -- the proposition...
 ok_split : (n:Nat) -> (l:List a) ->
  LTE (maxLength $ splitIntoSize n l) n

How to prove something like this? I’m not sure whether we can (I’ve not tried), and we may yet need to exclude the n == O case.

Typically, we’d try induction on the limit n to see if it falls out easily, or maybe try induction on the input list instead. Either way, we’ll end up trying to prove a step case by pushing constraints around and maybe proving smaller results about how maxLength works on non-empty lists. We might also have to prove some simpler things about splitIntoSize first.

That sounds a bit too much like hard work though, and it probably is. I wouldn’t like to have to do it either (and I never fancied the traditional ‘formal methods’ taught in universities either, for much the same reasons).

Can we do better though? I think so. In some sense, the proof above is trying to establish something complex about code written in a (relatively!) traditional way, like with boring types. You might call it an attempt to retrofit some justification onto existing code, which just sounds like it’s not going to end well (the "tests first" people have a good point here!)

One of the pioneers of dependently typed programming, Conor McBride, has a great phrase that covers this retrofitting: ‘It’s like trying to fit toothpaste back in the tube.’

One Way Forward

The suggested answer to this mismatch is to build more of the understanding of ‘why it works’ into the program, using a richer type system to convey some of that information—plus we’ll use advanced tools like proof assistants to help deal with the extra layer of detail.

It’s time to wind up this article, so little space left except to give a quick peek at next month’s content. We’ll start to look at Inductive Families, a generalization of simple data types that allow us to embed data and logic in the types, and look at how they change the style of programming.

The presence of extra information in the types will make it easier to reason about certain properties (compared to the more indirect style required by the naive approach above), but also help to simplify the code by eliminating redundant cases and filling in the obvious things.

Quite a bit of the word-wrap code deals with limits on sizes, so it’s natural to want our word fragments to carry some size information. Hence we’ll make extensive use of vectors (or sized lists). Vectors are defined in the Idris Prelude like this:

 data Vect : Type -> Nat -> Type where
  Nil : Vect a O
  (::) : a -> Vect a n -> Vect a (S n)

Basically, it says Vect A n is a type, where A is the type of the payload data and n the length of the vector; then Nil constructs a vector of size zero, and (::) (i.e., vector cons) adds an element onto a vector of size n to give a vector of size S n (or 1 + n). We’re using Nat for sizes again, firstly because we don’t want negative sizes, but we do want arbitrary non-negative numbers—plus we want the inductive structure of Nat for use in various proofs.

I’ll leave you with a simple but very important definition, of taking the head or first element of a vector, similar to Haskell’s head or Lisp’s car. In functional languages, head [] fills much the same role as nil pointer exceptions (NPEs) in other languages.

(If a functional programmer gives you grief about NPE’s and why you don’t use option types like Maybe in your language, feel free to ask them about head []. It might be fun, unless they’ve already learned a bit of dependent types.)

We could get head to return a value in an option type, but this is a bit cumbersome. Wouldn’t it be better if we knew it was never actually called on input that was empty? And didn’t require any run-time checks either? This is quite straightforward with dependent types. We just define head on a sized vector as follows. The type says it can only work if the size is non-zero (literally, of size 1 + n for any n), and always returns a value of the payload type. Totality isn’t a problem either: there’s no Nil case, but the type constraint means that a Nil is never required (and indeed, just can’t occur), hence the definition is total as it stands.

So there you go: a fully safe head function! And this is only the beginning.

 head : Vect a (S n) -> a
 head (x::xs) = x

Dr Paul Callaghan first learnt about type theory in 1996, and was immediately fascinated. It filled in many of the unsatisfactory gaps in conventional functional programming, and helped him understand why some programs in Haskell felt ‘wrong’ and ‘incomplete.’ Paul was fortunate to do research work with some very talented people from 1997 to 2007, during which time he implemented a niche proof assistant called Plastic and experimented with various ideas like DSLs, Mathematical Vernacular, and Coercive Subtyping. Other bits of his bio can be seen on earlier articles. Paul currently works on a variety of bespoke rails apps at TranscendIt, and thanks TranscendIt for its continued flexibility with his writing schedule! Paul also flies big traction kites and can often be seen being dragged around inelegantly on the beaches of North-east England, much to the amusement of his kids. He blogs at and tweets as @paulcc_two.

Send the author your feedback or discuss the article in the magazine forum.