You can work in the functional programming paradigm in C++. And you may be surprised at how complete C++’s support for functional programming is.
Functional programming and C++. This combination will strike an equal mixture of disgust and terror in some of you. Others may be intrigued but daunted by the prospect.
Yet C++ has always been a multi-paradigm language. And while previous attempts to add practical functional programming features required significant effort, recent additions to the C++ standard, like lambdas, have improved the situation. This paper explores the out-of-box support for functional programming provided by the new standard. We’ll look at techniques typically found in introductory functional programming textbooks. This article assumes familiarity with C++, but not necessarily with basic functional programming.
The source code is available in github and is compiled using gcc 4.8 installed on Mac OSX using MacPorts.
Object-Oriented and Functional Programming Style
The heart of object-oriented programming is the encapsulation of data and methods in a coherent class or object. Each class or object represents an entity in the real world. Each object is responsible for the management of its state and as long as it fulfills the contract implied by its interface the implementation is of no concern to the caller. Objects interact by sending each other messages through method calls that change the internal state of the receiving object. Classes can be combined through inheritance or composition to form more complex entities.
The for-loop is a typical construct used in C++ classes. The for-loop processes elements in a container. These elements can be object instances or pointers to object instances. Usually the for-loop uses iterators to point to the next element to process in its body. The body of the for-loop typically has statements that affect the state of the element referenced by the iterator. When the for-loop reaches the end of the container all elements have been processed and some or all of them have been modified in some way. Any reference to the list acquired before the for-loop was executed will now reference the changed list. The same thing goes for references to elements in the list. So the execution of the for-loop may cause side-effects in other parts of the program, either by design or by accident. The type of programming that emphasizes the use of mutable data and statements is called an imperative programming style. It is hard to prove by simple statement inspection alone if an imperative program is correct, because its state may be affected by changes away from the statement being reviewed.
In contrast, functional programming stresses the construction of computations or functions acting on immutable values. Data and operations on the data are not co-mingled. Immutable data acts like a value like 1. You can hold a reference to 1, but 1 itself is immutable. You can add 2 to the reference but the reference itself still points to 1. This referential transparency through the use of pure functions and immutable data lies at the heart of functional programming.
Functions are first-class objects in a functional language. You can reference a function like you would any other data. Functions can have functions as arguments or return functions. Functions that take functions as arguments or that return functions are called higher order functions. You can use higher order functions to combine simpler functions into more complex ones. They play an important role in functional programming. for-loops are replaced by recursion for list processing.
Because functions play such an important role we need a formal way to represent them. This article uses Haskell’s notation for function signatures. A binary function f with arguments of type a and b and a return value of type c is represented:
The representation of function implementations uses a slightly different notation: the return type follows the double colon :: after the argument list. Here’s the type signature of the identity function:
Here are two implementations of id:
In a function definition a -> b, the arrow -> can be looked at as a type that takes two other types a and b to be fully defined. Types that are parameterized by other types, like the arrow operator -> are referred to as type constructors. In general, M a represents a type constructor M that takes a single type variable a, and M a corresponds to the C++ class template template < typename a > struct M..... The arrow operator corresponds to the function wrapper std :: function < a(b) >. Another frequently used type constructor is [ a ], which creates a list of elements of type a. [a] corresponds to the stl containers std :: list < a > or std :: forward_list < a >.
Lambda Expressions and Closures
Lambda expressions allow you to create functions on the fly. The expression in the body of the lambda can reference variables that are not specified in the argument list of the lambda expression. Those variables are called free variables. Free variables are assigned the value found in the environment (i.e., the scope) in which the lambda expression is defined. This capture of the enclosing environment by the lambda expression is called a closure.
The (slightly abbreviated) C++ syntax for the lambda expression is:
The capture specifier [...] specifies how the free variables are captured. If it’s empty , the body of the lambda can’t reference any variables outside its scope. The [=] specifier captures free variables by value, whereas the [&] captures them by reference. The ( params ) is the parameter list and -> rettype is an optional return type specifier. Lambdas can be bound to variables using std :: function or auto.
Listing 1: various ways lambdas capture the environment
Listing 1 illustrates the use of the capture specifier. The lambda func has no arguments and prints the value of the two free variables x and y to stdout. x and y are initialized to 0 and 42 respectively, preceding the lambda definition. The capture specifier of func is [x,&y] so x is captured by value and y by reference. The next three lambdas increment the free variables x and y. The lambda inc captures y by reference. inc alt, on the other hand, captures y by value. The keyword mutable allows the lambda expression to change y. inc alt alt captures the complete environment by reference, and increments both x and y. Then func is called each time y is changed. The values of x and y printed by func are shown in the comment. Since y is captured by reference is can be changed through side effects. On the other hand x is captured once and remains the same.
Listing 2: implementation of factorial using lambda recursion
Listing 2 shows a recursive implementation of the factorial function n!. Each invocation of the lambda prints the value of the argument x. The return type is specified using the optional return specifier. Notice that the lambda itself needs to be captured by reference.
Partial Function Application
A partially applied function is created when a function is called with fewer arguments than its argument list requires. In that case, a lambda is returned with the remainder of the arguments. In C++ partial function application is supported by std :: bind and std :: placeholders :: ... Both are defined in the header < functional >. The placeholders are in the namespace std :: placeholders and are named _1, _2, etc.
std :: bind takes a callable object or a function pointer as its first argument. Subsequent arguments are either values, or placeholders provided by std :: placeholders. std :: bind returns a function object. The relative position of the values and placeholders corresponds to the position of the argument in the argument list of the function f to which they’re bound. A placeholder corresponds to an argument of the callable returned by std :: bind. The number of arguments is equal to the number of distinct placeholders.
Listing 3: std :: bind example
Listing 3 illustrates the use of std :: bind and std :: placeholders. It’s used to create a function that calls another function repeatedly with the result of the previous function call. Lambda repeat is a higher-order function that repeatedly calls its third argument. This function is initially called with the value of the second argument. The number of repetitions is given by the first argument. std :: bind is used to create a function that uses the number of repetitions as the initial value. The callable object rpl returned by std :: bind uses the number of repeats as the initial value because the first and second argument of repeat are bound to the same placeholder. rpl(l1, 9) calls lambda l1 nine times, with 9 as the initial value. The result is printed to stdout, and its value is shown in the comment.
Currying (named after the mathematician Haskell B. Curry) is a technique that turns any function into a function of one variable. Currying is related to but is more powerful than partial function application. The curried version of a function is a higher-order function that returns a partially applied version of the original function.
The function signature of a function designed to curry binary functions f(x,y) is:
curry2 takes a binary function and returns a unary function. This unary function returns another unary function when it is called with an argument of type a. This function is a partially applied version of the uncurried function f, where the argument a is provided. When you call this function with an argument of type b you obtain the value returned by uncurried function f. Here’s an example where plus is being curried:
(curry2 plus) is the curried version of plus. The return types have been shown explicitly to highlight the fact that functions are returned. curry2 plus)(5) returns a lambda that represents the plus function partially applied to 5. This is then called with 6 as the argument with an unsurprising 11 as the result. A simple implementation of curry2 to curry binary functions in C++ is shown in listing 4:
Listing 4: curry for binary operators
Currying and partial function application simplify the design of higher order functions since we only have to consider unary functions. In fact currying plays an important role in functional programming.
Neither the C++ language nor its standard library provide facilities to curry functions. In fact, in C++ functions are not written in curried form. Compare this to Haskell where functions are curried by default. The programmer needs to either use a third-party library or roll her own implementation. Writing a curry operator has become a lot easier now that lambdas are supported.
Map, zipWith, and Zip
map applies a function f of type a -> b to each element of a list [a] and returns a new list [b].
The std :: for_each function appears to fill the the bill. It takes two iterators and a unary callable object as input. The callable is called with every element in the range delimited by the iterators, and its final state is returned. std :: for_each is clearly an imperative implementation of a for-loop.
A better choice is the std :: transform function. This function has two forms. The first one takes a unary callable object as input and applies it to the elements of one range and returns another.
In the second it takes a binary callable function, applies it to two input ranges, and returns the resulting range:
Listing 5 shows a possible implementation of the map function for a std :: forward_list.
Listing 5: map for std :: forward_list
Notice that map has two type parameters. The first specifies the type of the element in the input container L. The second type parameter specifies every generic callable. std :: function could have been used to provide a more type-safe interface. However, that would not allow us to use inline lambda functions. The type of each lambda is unique and therefore would not be converted to std :: function. The type of element in the result list is determined using decltype on the return type of the callable f.
Listing 6: using std :: bind to combine functions
In listing 6 std :: bind combines two functions and the result is then mapped over the list. The inner bind takes the curried plus function cplus as the first argument and puts a placeholder as the second argument. The lambda returned by the inner bind is then used as an input to the outer lambda. The first argument of the outer bind is a lambda that has a function as input. In the body of the lambda this function is called with 2. The placeholder is bound to the first argument of cplus and 2 is used as the value for the second argument. So in effect the function f(x) = 4*x+2 is mapped over the list. The result is printed to std :: cout. 94 the result of 4*23+2, 182 the result of 4*45+2, etc. This shows how function combination can be used to limit the number of iterations and list copies.
The second flavor of std :: transform applies a function to the elements of two lists to produce a third. This corresponds to zipWith:
The first argument of zipWith is a curried function, with input parameters of type a and b respectively and return type c. This function is applied to a list of elements of type a and b respectively. The result is a list of type c. A closely related and widely used function is zip, which takes two lists and returns a list of pairs.
This listing shows the implementation of zipWith and zip for a std :: forward_list using std :: transform. The type of the return list is derived by calling decltype on the function f, which is called with an instance of A and B.
Listing 7: zipping two lists
Listing 7 illustrates the use of zip on two lists.
Listing 8: curried version of zipWith
Listing 8 is closer to zipWith’s curried version shown in the function signature above. The listing shows the same zip operation as the previous one. However, the call to zipWith requires a complete specification of the template types. This increases the line noise somewhat. C++’s type system is not powerful enough to infer the types from type of the arguments to op.
Reduce and the List Monad
The type signature for reduce is:
reduce moves or folds a binary operation over a list and returns a result. The type of the first argument to the binary operation is the same as the type returned by reduce. It’s also the type of the first input variable encountered after operator specification. The first input variable is used to initialize the first argument to the binary operation, when the first element of the list [b] is being processed.
In fact map can be implemented in terms of reduce. In that case, the type a would be the list type, and the initial value would be the empty list. The binary operator would then concatenate the result of a unary operation onto the list. Because map can be implemented using reduce, reduce is more powerful than map.
The stl function that closely matches the type signature for reduce is the std :: accumulate function found in the < numeric > header.
The version we use second takes a binary operator and a couple of list iterators as input.
The function signature of std :: accumulate tracks that of reduce fairly closely. The main difference is the order of the arguments and the lack of curry.
Listing 9: example of std :: accumulate
Listing 9 shows how we can use std :: accumulate to the maximum value in a list. std :: numericlimits < int > :: min returns the smallest possible integer value and is used to initialize the search. The binary operation is just a lambda wrapped around the compare operator and std :: accumulate returns the expected result.
Listing 10: processing a list using reduce
In listing 10 std :: accumulate is used to process a list by applying a function to each element. Notice that the body of the lambda m does in fact two things: The actual operation we would want to perform (2*y+1 in this case) as well as the concatenation of the result of this operation to the target list.
Listing 11: unary operation and reduce
Listing 11 refactors the code in listing 10 by separating the unary operation and the list concatenation. The generalized function signature of the unary operator (the lambda bound to op) is a -> [b]. At first blush this looks like a clunky re-implemention of the map function but in fact it’s more powerful.
Listing 12: the list monad
Listing 12 shows the implementation of a function called mapM based on the refactoring done in listing 11. Its signature resembles that of map. Just like map, mapM takes a unary function f and a list and returns a list:
However, note that f returns a list of elements, rather than a single value. This makes mapM a lot more powerful.
Listing 13: comparing mapM and map
Listing 13 uses mapM to redo the previous example shown in listing 7.
The main difference is that the function op that is mapped over the list returns a list rather than a single element, like it did in listing 7. In fact we could extend this example by having op return more than one element, or no elements at all. Regardless, mapM would return a list of results.
If you try to do the same thing with map, a list of lists is returned.
In a typical scenario you’d want to apply a number of operations to a list. mapM allows each function to have the same signature. It takes a single element and returns a list of elements. The next application on a list returned by map would (as you can see when the results are printed in the example) require an iteration over the result list. In fact, the resulting list is fundamentally different from the input list. It’s the ability of mapM to join the result lists of the operation into a single list that provides a great deal of power.
In fact the type signature of mapM is that of the monad implementation for lists. The use of monads and other types provide a powerful extension of the functional approach. I hope to discuss the support for those in a follow-up article.
Is functional programming possible for the mainstream programmer in C++? In this article I’ve discussed basic functional techniques, like lambda expressions and closures, partial function application, currying, map, and reduce. In addition, I’ve introduced a more powerful, monadic form of the map function. I’ve shown that the new additions, notably lambdas and closures, to the standard have made the use of these functional techniques a possibility. Sometimes using functional features introduces a lot of line noise in the form of accolades, returns, or semi-colons. But C++ has never been quiet in that respect, and the standard has added features, like the auto declaration and range-based for loops, that reduce this noise somewhat. The use of currying in particular may introduce some added noise in that regard. Error messages generated by the compiler are another concern. I have not shown the reader the reams of messages produced when something goes wrong. Again, this is not something entirely new to C++, but it can be a daunting task to work through.
Functional programming emphasizes referential transparency through the use of immutable data. Changes are made to a copy of that data item. In the implementations of map and reduce shown here, new lists are created containing the changed data elements. To remain referentially transparent this requires that the copy semantics of the objects is relatively straightforward. That, in turn, requires the use of straightforward data types, which behave like values, and don’t maintain state. The creation of a completely new list of data items introduces an obvious performance penalty. In languages designed for functional programming the cost of this approach is reduced because items are in fact reused. In C++ the tradeoff of referential transparency versus performance is a real one.
The extension of the functional approach to a richer class of problems, like IO, has introduced a whole new set of concepts. The extent to which those concepts are supported in C++ is a topic I’d like to address in another article.
Alfons started developing software in earnest when he was working on his PhD in physics. He transitioned into the IT side of the finance industry where he has been working on trading systems in C++, python, and Perl. A few years ago he developed an interest in functional programming, learned lisp, and authored lisp libraries for mongodb and twitter, which can be found on his github page. When he’s not programming he’s trying to improve his ironman performance, to little or no avail.