The Zenph team consists of three developers and one expert user. Yes,
the user is very much part of the team, and in fact, drives the
development effort, but more on that a bit later.
The team is geographically dispersed across the U.S., connected by
a virtual private network (VPN). That’s less surprising these days
than in years past (and in fact is the rule on virtually every project
we’ve worked on), but none of our projects have ever featured a
fully-automated recording studio complete with webcam! With this
setup, the developers can feed high-definition MIDI files to the Disklavier Pro,
watch and record the results using the studio microphones, and
download the resulting audio file to wherever they happen to be.
Version control should be the first thing any project implements, and according to Zenph co-founder Peter Schwaller, they started with Perforce from Day One of the project. Unlike many unsuccessful projects, there was never even a
question about using a mature, robust version control system for development. Again unlike many others, Peter and the
team put everything under version control, from PowerPoint presentations for potential investors to their
Articles of Incorporation and so on.
Now we get to the interesting part. How on earth do you unit test
in this sort of environment? JUnit, for example, doesn’t have an
assertion for “does this audio file sound close to this other audio
Their goal is to generate high-definition MIDI files that, when
played back, sound like those created when real performers play at the piano.
But comparing either MIDI or audio files for “closeness” is tricky business.
While MIDI gives you discrete events instead of a pile of waveforms, it’s not as
easy as it sounds.
Unlike standard MIDI, these
high-definition MIDI files are built with around 10 attributes per note, all
shifting and sliding around in time as well. Using simple assert
statements as one would find in JUnit, CPPUnit and so on won’t cut it:
the high-definition MIDI files don’t have to be exact (and in fact won’t be), they
have to sound the same.
Now it turns out that some perceptual acoustics come into play. The
human ear is very discriminating when it comes to
timing, for instance. A difference of even 2-3 milliseconds is
very noticeable in the right context.
So Peter Schwaller came up with the concept of a
bit of testing framework that compares high-definition MIDI using a set of
heuristics, examining all 10 attributes per note, and judging the
“performance.” This gave the team a way to quantify the end goal of
As we describe in our
you first have to know what
“done” means in order to actually
objective measure of success, this team could have floundered
internally for months arguing whether they were converging on a
solution or not, or worse, delivering product that the end user
But not only did the team solve that problem, they took the idea one
step further and made their boss—John—write the Grader. In other words, they
made the project manager define success in a quantifiable, achievable
manner. That alone can make the difference between a successful
project and a death march.
It’s also important to note that the Grader and its heuristics didn’t
spring into life fully complete. John added heuristics to the Grader
incrementally as development progressed: a 3ms discrepancy is better
than a 300ms one, and one isolated discrepancy is better than 5 in a
row, and so on. They tuned and grew the grader as they went, slowly
but steadily converging on the goal.
Some teams would want to start testing with a Rachmaninoff piano
concerto or some such, and that’s a huge mistake. You always need to
start with small, isolated unit tests before moving on to more
advanced functional or acceptance testing.
In this case, the first six months or so of unit tests comprised some
beautiful piano solos made of just one note. Just one single note at
a time, mapping out the full range of the instrument. It was quite a
while before the unit tests got the software to the point where they
could try “Mary Had A Little Lamb,” and quite a while after that
before poor Mary got any rhythm.
Agile methodologies recommend that you have an expert as part of the
development team, so that you can get rapid feedback as the software
matures and quick decisions on gray areas.
(As with any project, the challenge is in drilling down to the user’s
true need without being distracted by their view of possible
The Zenph team has an expert user—Anatoly Larkin is finishing up his Doctor of
Musical Arts degree in Piano, no less. His
close association with the team and attention to detail shortened the feedback gap and let the
team move with great speed at satisfying the requirements.
Different programming languages are better suited for certain
applications than others; the trick is to use the right tool for the
job. Telecom and audio apps tend to favor C++, so their
production-level audio analysis software is written in C++.
But that’s not the end of the story—they don’t use C++ for
everything. For rapid prototyping of low-level algorithms, the
developers use a scripting language (they happen to use Perl; other
teams we know use Ruby or Python). They use the same scripting
language to power the automated build and test, and for controlling
and coordinating remote-control access to the studio and equipment.
On the C++ side, they’ve continued the practices that John and the
developers used at Ganymede: lots of good instrumentation in the code
itself. This includes custom runtime assertions with full call-stack
reporting and so on. As to the age-old debate as to whether you
should leave these assertions enabled at runtime, the Zenph team comes
down heartily on the side of enablement. Since they are deploying
this software in a service-bureau model, there’s no downside to
displaying very technically detailed assertion failures at runtime
(software designed for your grandmother, on the hand, may need to
adopt a somewhat less threatening posture).
Another enduring question that plagues many teams is “build or buy?”
When is it most advantageous to build custom software, and when is it
better to just buy it? As developers, we tend to prefer to build our
own stuff all the time (pragmatic author Venkat Subramaniam
refers to this as
The team struck a good balance, leveraging open source,
commodity-level functionality where possible. Items such as path and
extension handling, threading libraries, GUI widgets, and so on were
ably handled by libraries from boost.org, the wxWidgets set, and
others. They wisely reserved custom crafting where it was most
needed, including their core audio algorithms and components such as
the MIDI data pretty printer.
Now printing out binary MIDI data in a nicely formatted,
human-readable form may sound like a frivolous extra, or at least
something one could get off the shelf. But as the team explained to
me, they didn’t want some third party piece of software to mask,
distort, filter, or otherwise lie about the data. They needed to know
precisely what was going on, with no surprises. Too often we
take critical diagnostics on faith, only later to realize that the
information wasn’t accurate or complete.