“I brought you into this world,” my father would say, “and I can take you out. It don’t make no difference to me. I’ll just make another one like you.”

— Bill Cosby, Fatherhood

We all manage resources whenever we code: memory, transactions, threads, files, timers—all kinds of things with limited availability. Most of the time, resource usage follows a predictable pattern: you allocate the resource, use it, and then deallocate it.

However, many developers have no consistent plan for dealing with resource allocation and deallocation. So let us suggest a simple tip:

Finish What You Start

This tip is easy to apply in most circumstances. It simply means that the routine or object that allocates a resource should be responsible for deallocating it. Let’s see how it applies by looking at an example of some bad code—an application that opens a file, reads customer information from it, updates a field, and writes the result back. We’ve eliminated error handling to make the example clearer.


  void readCustomer(const char *fName, Customer *cRec) {

    cFile = fopen(fName, "r+");
    fread(cRec, sizeof(*cRec), 1, cFile);
  }

  void writeCustomer(Customer *cRec) {

    rewind(cFile);
    fwrite(cRec, sizeof(*cRec), 1, cFile);
    fclose(cFile);
  }

  void updateCustomer(const char *fName, double newBalance) {

    Customer cRec;

    readCustomer(fName, &cRec);

    cRec.balance = newBalance;

    writeCustomer(&cRec);
  }

At first sight, the routine updateCustomer looks pretty good. It seems to implement the logic we require—reading a record, updating the balance, and writing the record back out. However, this tidiness hides a major problem. The routines readCustomer and writeCustomer are tightly coupled1—they share the global variable cFile. readCustomer opens the file and stores the file pointer in cFile, and writeCustomer uses that stored pointer to close the file when it finishes. This global variable doesn’t even appear in the updateCustomer routine.

Why is this bad? Let’s consider the unfortunate maintenance programmer who is told that the specification has changed—the balance should be updated only if the new value is not negative. She goes into the source and changes updateCustomer:


  void updateCustomer(const char *fName, double newBalance) {

    Customer cRec;

    readCustomer(fName, &cRec);

    if (newBalance >= 0.0) {
      cRec.balance = newBalance;

      writeCustomer(&cRec);
    }
  }

All seems fine during testing. However, when the code goes into production, it collapses after several hours, complaining of too many open files. Because writeCustomer is not getting called in some circumstances, the file is not getting closed.

A very bad solution to this problem would be to deal with the special case in updateCustomer:


  void updateCustomer(const char *fName, double newBalance) {

    Customer cRec;

    readCustomer(fName, &cRec);

    if (newBalance >= 0.0) {
      cRec.balance = newBalance;

      writeCustomer(&cRec);
    }
    else
      fclose(cFile);
  }

This will fix the problem—the file will now get closed regardless of the new balance—but the fix now means that three routines are coupled through the global cFile. We’re falling into a trap, and things are going to start going downhill rapidly if we continue on this course.

The finish what you start tip tells us that, ideally, the routine that allocates a resource should also free it. We can apply it here by refactoring the code slightly:


  void readCustomer(FILE *cFile, Customer *cRec) {
    fread(cRec, sizeof(*cRec), 1, cFile);
  }

  void writeCustomer(FILE *cFile, Customer *cRec) {
    rewind(cFile);
    fwrite(cRec, sizeof(*cRec), 1, cFile);
  }

  void updateCustomer(const char *fName, double newBalance) {
    FILE *cFile;
    Customer cRec;

    cFile = fopen(fName, "r+");        // >---
    readCustomer(cFile, &cRec);        //     |
    if (newBalance >= 0.0) {           //     |
      cRec.balance = newBalance;       //     |
      writeCustomer(cFile, &cRec);     //     |
    }                                  //     |
    fclose(cFile);                     // <---
  }

Now all the responsibility for the file is in the updateCustomer routine. It opens the file and (finishing what it starts) closes it before exiting. The routine balances the use of the file: the open and close are in the same place, and it is apparent that for every open there will be a corresponding close. The refactoring also removes an ugly global variable.

Nest Allocations

The basic pattern for resource allocation can be extended for routines that need more than one resource at a time. There are just two more suggestions:

  1. Deallocate resources in the opposite order to that in which you allocate them. That way you won’t orphan resources if one resource contains references to another.
  2. When allocating the same set of resources in different places in your code, always allocate them in the same order. This will reduce the possibility of deadlock. (If process A claims resource1 and is about to claim resource2, while process B has claimed resource2 and is trying to get resource1, the two processes will wait forever.)

It doesn’t matter what kind of resources we’re using—transactions, memory, files, threads, windows—the basic pattern applies: whoever allocates a resource should be responsible for deallocating it. However, in some languages we can develop the concept further.

Objects and Exceptions

The equilibrium between allocations and deallocations is reminiscent of a class’s constructor and destructor. The class represents a resource, the constructor gives you a particular object of that resource type, and the destructor removes it from your scope.

If you are programming in an object-oriented language, you may find it useful to encapsulate resources in classes. Each time you need a particular resource type, you instantiate an object of that class. When the object goes out of scope, or is reclaimed by the garbage collector, the object’s destructor then deallocates the wrapped resource.

This approach has particular benefits when you’re working with languages such as C++, where exceptions can interfere with resource deallocation.

Balancing and Exceptions

Languages that support exceptions can make resource deallocation tricky. If an exception is thrown, how do you guarantee that everything allocated prior to the exception is tidied up? The answer depends to some extent on the language.

Balancing Resources with C++ Exceptions

C++ supports a try...catch exception mechanism. Unfortunately, this means that there are always at least two possible paths when exiting a routine that catches and then rethrows an exception:


  void doSomething(void) {

    Node *n = new Node;

    try {
      // do something
    }
    catch (...) {
      delete n;
      throw;
    }

    delete n;
  }

Notice that the node we create is freed in two places—once in the routine’s normal exit path, and once in the exception handler. This is an obvious violation of the DRY principle and a maintenance problem waiting to happen.

However, we can use the semantics of C++ to our advantage. Local objects are automatically destroyed on exiting from their enclosing block. This gives us a couple of options. If the circumstances permit, we can change “n” from a pointer to an actual Node object on the stack:

 
  void doSomething1(void) {

    Node n;

    try {
      // do something
    }
    catch (...) {
      throw;
    }
  }

Here we rely on C++ to handle the destruction of the Node object automatically, whether an exception is thrown or not.

If the switch from a pointer is not possible, the same effect can be achieved by wrapping the resource (in this case, a Node pointer) within another class.


  // Wrapper class for Node resources
  class NodeResource {
    Node *n;

   public:
    NodeResource() { n = new Node; }
    ~NodeResource() { delete n; }

    Node *operator->() { return n; }
  };

  void doSomething2(void) {

    NodeResource n;

    try {
      // do something
    }
    catch (...) {
      throw;
    }
  }

Now the wrapper class, NodeResource, ensures that when its objects are destroyed the corresponding nodes are also destroyed. For convenience, the wrapper provides a dereferencing operator ->, so that its users can access the fields in the contained Node object directly.

Because this technique is so useful, the standard C++ library provides the template class auto_ptr, which gives you automatic wrappers for dynamically allocated objects.


  void doSomething3(void) {
    auto_ptr<Node> p (new Node);

    // Access the Node as p->...

    // Node automatically deleted at end
  }

Balancing Resources in Java

Unlike C++, Java implements a lazy form of automatic object destruction. Unreferenced objects are considered to be candidates for garbage collection, and their finalize method will get called should garbage collection ever claim them. While a convenience for developers, who no longer get the blame for most memory leaks, it makes it difficult to implement resource clean-up using the C++ scheme. Fortunately, the designers of the Java language thoughtfully added a language feature to compensate, the finally clause. When a try block contains a finally clause, code in that clause is guaranteed to be executed if any statement in the try block is executed. It doesn’t matter whether an exception is thrown (or even if the code in the try block executes a return)—the code in the finally clause will get run. This means we can balance our resource usage with code such as


  public void doSomething() throws IOException {

    File tmpFile = new File(tmpFileName);
    FileWriter tmp = new FileWriter(tmpFile);

    try {
      // do some work
    }
    finally {
      tmpFile.delete();
    }
  }

The routine uses a temporary file, which we want to delete, regardless of how the routine exits. The finally block allows us to express this concisely.

When You Can’t Balance Resources

There are times when the basic resource allocation pattern just isn’t appropriate. Commonly this is found in programs that use dynamic data structures. One routine will allocate an area of memory and link it into some larger structure, where it may stay for some time.

The trick here is to establish a semantic invariant for memory allocation. You need to decide who is responsible for data in an aggregate data structure. What happens when you deallocate the top-level structure? You have three main options:

  1. The top-level structure is also responsible for freeing any substructures that it contains. These structures then recursively delete data they contain, and so on.
  2. The top-level structure is simply deallocated. Any structures that it pointed to (that are not referenced elsewhere) are orphaned.
  3. The top-level structure refuses to deallocate itself if it contains any substructures.

The choice here depends on the circumstances of each individual data structure. However, you need to make it explicit for each, and implement your decision consistently. Implementing any of these options in a procedural language such as C can be a problem: data structures themselves are not active. Our preference in these circumstances is to write a module for each major structure that provides standard allocation and deallocation facilities for that structure. (This module can also provide facilities such as debug printing, serialization, deserialization, and traversal hooks.)

Finally, if keeping track of resources gets tricky, you can write your own form of limited automatic garbage collection by implementing a reference counting scheme on your dynamically allocated objects. The book More Effective C++ dedicates a section to this topic.

Checking the Balance

Because Pragmatic Programmers trust no one, including ourselves, we feel that it is always a good idea to build code that actually checks that resources are indeed freed appropriately. For most applications, this normally means producing wrappers for each type of resource, and using these wrappers to keep track of all allocations and deallocations. At certain points in your code, the program logic will dictate that the resources will be in a certain state: use the wrappers to check this. For example, a long-running program that services requests will probably have a single point at the top of its main processing loop where it waits for the next request to arrive. This is a good place to ensure that resource usage has not increased since the last execution of the loop.

At a lower, but no less useful level, you can invest in tools that (among other things) check your running programs for memory leaks. Purify (www.rational.com) and Insure++ (www.parasoft.com) are popular choices.

Exercises

  1. Some C and C++ developers make a point of setting a pointer to NULL after they deallocate the memory it references. Why is this a good idea?
  1. Some Java developers make a point of setting an object variable to NULL after they have finished using the object. Why is this a good idea?

Challenges

  • Although there are no guaranteed ways of ensuring that you always free resources, certain design techniques, when applied consistently, will help. In the text we discussed how establishing a semantic invariant for major data structures could direct memory deallocation decisions. Consider how Design by Contract, page 101, could help refine this idea.

Footnotes:

For a discussion of the dangers of coupled code, see Decoupling and the Law of Demeter, page 128.


Extract from The Pragmatic Programmer Copyright © 2000 Addison Wesley Longman, Inc. Reproduced with permission.

Any trademarks are the properties of their owners