Pretty image
Don’t let fear keep you from getting all the benefits of your mistakes.

You’re recovering from a major operation—which nursing unit do you choose? One that reports an error once every 500 patient days, or one that reports an error once every 50 days?

What if I were to tell you that in the first unit, which on the face of things makes 10 times fewer mistakes, nurses don’t report errors because they’re concerned that “heads will roll.” Would that change how you feel? How many errors are being swept under the carpet? Do you think that they’re likely to be learning from their mistakes or repeating them over and over again? (This example comes from Hard Facts, Dangerous Half-Truths & Total Nonsense by Jeffrey Pfeffer and Robert I. Sutton, Harvard Business School Press, ISBN: 1-59139-862-2.)

In software, we have our own name for mistakes—we call them bugs. And every bug is an opportunity to learn.

Learning from Bugs

The fact that a bug crept into the code in the first place means that something went wrong somewhere in your process. Perhaps the requirements were ambiguous or misunderstood? Maybe there was an oversight within the architecture? Were your tests inadequate? Or perhaps the bug was even in the tests in the first place?

That’s why the debugging process comprises four phases: Reproduce, Diagnose, Fix, and (the phase we’ll be concentrating on here) Reflect:

remember.jpg

By reflecting on how the bug got into the software, you can identify the source of the error and learn the lessons necessary to ensure that it can never happen again.

Fear is the Enemy

But this learning won’t happen in a climate of fear. Yes, someone somewhere probably screwed up, but we all make mistakes occasionally. Pointing the finger is unlikely to be productive or helpful.

If they fear that they will be pilloried or punished for their mistakes, your colleagues will start worrying more about how to protect their backs than about what’s best for the team or wider organization. In the worst cases, this can even lead to lying, setting up fall guys, and other dysfunctional behavior.

So how do you strike the balance? How do you remorselessly unearth the lessons of each bug without descending into a blame culture? It’s tempting to suggest that you should “forgive and forget,” but in fact you should aim to forgive and remember.

Forgiveness is crucial—as anyone who has ever worked on a non-trivial software project knows, bugs are inevitable. No matter how hard we try, some problems will always slip through the cracks. But remembering is also vital; otherwise we’re doomed to repeat history.

The great thing about software is that, often, we can build that memory directly into the software itself.

Software that Remembers for You

Here’s an example: I currently work on a large Ruby on Rails application. Our controllers are divided up into several modules—we have an Admin module, for example, that contains all the controllers that implement our administrative interface.

Unsurprisingly, these controllers share quite a bit of functionality—each should check, for instance, that the user has administrative privileges. So we factored this common functionality into a base class called AdminController.

All great, except that we found that occasionally we would create a new administrative controller, but forget to ensure that it derived from the right base class (Rails’ code-generation wizards always create controllers that derive from ApplicationController). We could have addressed this by having a checklist of things to do every time we create a new controller. But we can do much better.

Here’s how we guaranteed that we’ll never make the same mistake again—the following test automatically checks that we derive from the right base class in all of our admin controllers:

  class Admin::AdminControllerTest < ActionController::TestCase
  def test_derivation
 1 Admin.constants.each do |klass_name|
 2 klass = Admin.module_eval(klass_name)
 3 ancestors = klass.ancestors
 
  if ancestors.include?(ApplicationController)
 4 assert ancestors.include?(Admin::AdminController),
  "Bad derivation for #{klass}"
  end
  end
  end
  end

How does this work? On line 1, we use Ruby’s reflection to iterate over the names of all the constants defined in the Admin module (in Ruby, class names are constants in their containing module). Line 2 converts the name to a Class object and then on line 3 we find out which classes it derives from. Finally at 4, we check that all controllers (classes that derive from ApplicationController) also derive from Admin::AdminController. Can you always build this kind of thing into the software? Unfortunately, you can’t. And even when you can, it’s occasionally more trouble than it’s worth. But you might be surprised by just how often it’s possible, and how often it saves your blushes once you start thinking in this way.

A Healthy Culture

The most far-reaching way to ensure that you learn the lessons of each and every bug is to foster a healthy team culture. You’re aiming for simultaneous forgiveness, in which people are willing to admit and discuss the inevitable errors, and critical introspection, in which you act as though bug-free software is an attainable goal, leaving no stone unturned and ignoring no tool or technique that might get you closer.

In Hard Facts, Dangerous Half-Truths and Total Nonsense, Pfeffer and Sutton suggest that there are several types of people who help sustain this kind of learning:

  • Noisy complainers repair problems right away and then let every relevant person know that the system failed.

  • Noisy troublemakers always point out others’ mistakes, but do so to help them and the system learn, not to point fingers.

  • Mindful error-makers tell managers and peers about their own mistakes, so that others can avoid making them too. When others spot their errors, they communicate that learning—not making the best impression—is their goal.

  • Disruptive questioners won’t leave well enough alone. They constantly ask why things are done the way they are done. Is there a better way of doing things?

What can you do to help your team develop this culture? Leading by example is particularly powerful, for good or for ill. If you start ranting about the culprit after tracking down a particularly sticky problem, other members of the team are likely to adopt the same behavior. If, by contrast, a problem of your own making comes to light, own up and admit mea culpa to demonstrate that there’s nothing to be ashamed about. And then do everything you possibly can to eliminate any chance that you or someone else can make the same mistake in the future.

Paul Butcher has worked in diverse fields at all levels of abstraction, from microcode on bit-slice processors to high-level declarative programming, and all points in between. Paul’s experience comes from working for startups, where he’s had the privilege of collaborating with several great teams on cutting-edge technology. He is the author of Debug It!: Find, Repair, and Prevent Bugs in Your Code.