The Zenph team consists of three developers and one expert user. Yes, the user is very much part of the team, and in fact, drives the development effort, but more on that a bit later.
The team is geographically dispersed across the U.S., connected by a virtual private network (VPN). That’s less surprising these days than in years past (and in fact is the rule on virtually every project we’ve worked on), but none of our projects have ever featured a fully-automated recording studio complete with webcam! With this setup, the developers can feed high-definition MIDI files to the Disklavier Pro, watch and record the results using the studio microphones, and download the resulting audio file to wherever they happen to be.
Version control should be the first thing any project implements, and according to Zenph co-founder Peter Schwaller, they started with Perforce from Day One of the project. Unlike many unsuccessful projects, there was never even a question about using a mature, robust version control system for development. Again unlike many others, Peter and the team put everything under version control, from PowerPoint presentations for potential investors to their Articles of Incorporation and so on.
Now we get to the interesting part. How on earth do you unit test in this sort of environment? JUnit, for example, doesn’t have an assertion for “does this audio file sound close to this other audio file?”
Their goal is to generate high-definition MIDI files that, when played back, sound like those created when real performers play at the piano. But comparing either MIDI or audio files for “closeness” is tricky business. While MIDI gives you discrete events instead of a pile of waveforms, it’s not as easy as it sounds.
Unlike standard MIDI, these high-definition MIDI files are built with around 10 attributes per note, all shifting and sliding around in time as well. Using simple assert statements as one would find in JUnit, CPPUnit and so on won’t cut it: the high-definition MIDI files don’t have to be exact (and in fact won’t be), they have to sound the same.
Now it turns out that some perceptual acoustics come into play. The human ear is very discriminating when it comes to timing, for instance. A difference of even 2-3 milliseconds is very noticeable in the right context.So Peter Schwaller came up with the concept of a
As we describe in our
you first have to know what
“done” means in order to actually
But not only did the team solve that problem, they took the idea one step further and made their boss—John—write the Grader. In other words, they made the project manager define success in a quantifiable, achievable manner. That alone can make the difference between a successful project and a death march.
It’s also important to note that the Grader and its heuristics didn’t spring into life fully complete. John added heuristics to the Grader incrementally as development progressed: a 3ms discrepancy is better than a 300ms one, and one isolated discrepancy is better than 5 in a row, and so on. They tuned and grew the grader as they went, slowly but steadily converging on the goal.
Some teams would want to start testing with a Rachmaninoff piano concerto or some such, and that’s a huge mistake. You always need to start with small, isolated unit tests before moving on to more advanced functional or acceptance testing.
In this case, the first six months or so of unit tests comprised some beautiful piano solos made of just one note. Just one single note at a time, mapping out the full range of the instrument. It was quite a while before the unit tests got the software to the point where they could try “Mary Had A Little Lamb,” and quite a while after that before poor Mary got any rhythm.
Agile methodologies recommend that you have an expert as part of the development team, so that you can get rapid feedback as the software matures and quick decisions on gray areas. (As with any project, the challenge is in drilling down to the user’s true need without being distracted by their view of possible implementations.)
The Zenph team has an expert user—Anatoly Larkin is finishing up his Doctor of Musical Arts degree in Piano, no less. His close association with the team and attention to detail shortened the feedback gap and let the team move with great speed at satisfying the requirements.
Different programming languages are better suited for certain applications than others; the trick is to use the right tool for the job. Telecom and audio apps tend to favor C++, so their production-level audio analysis software is written in C++.
But that’s not the end of the story—they don’t use C++ for everything. For rapid prototyping of low-level algorithms, the developers use a scripting language (they happen to use Perl; other teams we know use Ruby or Python). They use the same scripting language to power the automated build and test, and for controlling and coordinating remote-control access to the studio and equipment.
On the C++ side, they’ve continued the practices that John and the developers used at Ganymede: lots of good instrumentation in the code itself. This includes custom runtime assertions with full call-stack reporting and so on. As to the age-old debate as to whether you should leave these assertions enabled at runtime, the Zenph team comes down heartily on the side of enablement. Since they are deploying this software in a service-bureau model, there’s no downside to displaying very technically detailed assertion failures at runtime (software designed for your grandmother, on the hand, may need to adopt a somewhat less threatening posture).
Another enduring question that plagues many teams is “build or buy?” When is it most advantageous to build custom software, and when is it better to just buy it? As developers, we tend to prefer to build our own stuff all the time (pragmatic author Venkat Subramaniam refers to this as “Resume-Driven Design”).
The team struck a good balance, leveraging open source, commodity-level functionality where possible. Items such as path and extension handling, threading libraries, GUI widgets, and so on were ably handled by libraries from boost.org, the wxWidgets set, and others. They wisely reserved custom crafting where it was most needed, including their core audio algorithms and components such as the MIDI data pretty printer.
Now printing out binary MIDI data in a nicely formatted, human-readable form may sound like a frivolous extra, or at least something one could get off the shelf. But as the team explained to me, they didn’t want some third party piece of software to mask, distort, filter, or otherwise lie about the data. They needed to know precisely what was going on, with no surprises. Too often we take critical diagnostics on faith, only later to realize that the information wasn’t accurate or complete.