Brian moves from a company where he was one of 40,000 employees to one where he’s one of six. Culture shock ensues.
Most problems these days seem large, or at least most of the interesting ones do. Some of these problems are programming problems and some of them are problems about programming teams.
“How do I scale my architecture to the cloud?”
“How do I scale my team to four continents?”
In this article we’ll look at a small project and a small team and see if we uncover anything helpful.
I Get Googled
Four months ago I left Motorola-Mobility and the world of video on demand. After my company was purchased by Google, I found myself one of over 40,000 employees, a staggering percentage of whom were lawyers. As a Distinguished Member of Technical Staff, one of my primary job functions was to think up ideas to patent. We were scored on how many “disclosures” we generated per quarter. I also served on a board that reviewed our use of open source software, ensuring that all the terms of all the licenses were adhered to. Don’t get me wrong, as the owner of two open source projects I’m in favor of following the license terms. However, it had become another large lawyer-driven part of the business.
Given that we dealt with entertainment content (i.e. movies and TV shows), digital rights management was also a key concern. There was a US court ruling that said if two people recorded the same show, we had to keep two physical copies of the actual bits that were recorded. If ten thousand people all recorded the same show (say, the Super Bowl) we had to store ten thousand separate copies. Naturally we tried to skirt the edges of this ruling while still obeying it. Our design meetings were held with the understanding that anything we said might be repeated as testimony in a future lawsuit.
This did not tend to feel agile.
I spent years trying to institute Scrum and standard best practices like code reviews, static analysis tools, and compiling code before checking it in. (I wish I was making that last item up.) I finally took the advice I heard years ago (though I cannot remember from whom) which was: “change your company or change your company.” Having tried for years to change my company, I decided it was time to change my company.
Changing My Company
My new company has a grand total of six employees, three of whom are programmers. We have one engineer who maintains the existing system while the CEO and I build the new system. To say that this move was a culture shock would be an understatement.
The new place doesn’t use any of the best practices I had tried so hard to implement at the Big Shop. We don’t do formal code inspections, we don’t have a Style Guide printed in gold (and blood). We don’t run PMD, FindBugs, or CheckStyle.
But the thing that shocked me after a few weeks was that I didn’t miss those things. In fact, I’ve come to believe that most best practices are just crutches for having a heterogeneous skill mix in one’s team. Heresy! Let me say that again: best practices are crutches. I am reminded of the comment often made by proponents of modern languages like Ruby and Scala that design patterns are simply compensations for the lack of power in legacy languages like Java. You don’t see Scala programmers talking about the Singleton pattern or the Strategy pattern because we have closures and map reduce built right into the language.
Rather than having team members object to enforcing rules because the rules could not be followed, we have a team that doesn’t need the rules because we wouldn’t dream of not following them.
Every day when we come in, the company president and I sit down and talk about what we’re working on. We talk about the problems we’re trying to solve, show each other how we dealt with yesterday’s problems, and share things we’ve read. We’re both voracious readers and so most days one or the other of us has read a new book or chapter of a book, and we talk about it. We set out a plan for the day and then go back to our offices and execute. That may sound like a daily scrum, but it is far less formal and more organic. You could also describe it as two people working together. (People, not resources.)
What We Don’t Do
We don’t do formal code inspections, but I know that my boss looks at all my checkins, so I make sure he doesn’t find anything to object to. If either of us is going to make a significant change, we talk about it beforehand.
We don’t have an official burn down chart and we don’t use GreenHopper or any such tools. We’ve sketched out our system architecture on a large whiteboard, and as pieces are completed we put check boxes next to them. Problems with a piece are similarly marked on the whiteboard. Again, it is informal and feels organic. It’s hard now to imagine doing it any other way.
I’ll also say that my office configuration rocks. I have an actual (large) office with a door, but the main wall of the office is glass. This gives me sound isolation and heads-down privacy when I need it, but doesn’t make me feel cut off from everyone else. The office is approximately twice the size of my cube at Motorola, which gives me room for double-monitors, desk space, whiteboard, and even a small conference table. It’s as if the company values me as more than a “resource”! I even have a window that opens. In thirty years I’ve never had a window that opens.
In other matters:
We don’t use email at all... we talk to each other.
We don’t hold meetings... we talk to each other.
We don’t design via Design Documents... we talk to each other.
See any pattern here?
Not to paint too rosy a picture, it’s worth mentioning the stress associated with a startup. At Motorola I was creating prototypes of social TV tools that would be used as the basis for new products and for patent-related lawsuits. In both cases the outcome would not occur for years, and many within the company assumed that Motorola would be long gone by then. So any urgency to perform the work had to be self-generated.
At Cabot, by comparison, we’re measuring progress day by day. Every day without the product means another day of sales that didn’t happen. If I don’t build redundancy into our solution, no one but me will be answering the support call. So my MacBook and I are working at the office, on the train, on the couch, etc. I’m tired most of the time, but it‘s tired from working rather than tired from being frustrated, and that makes all the difference. (Dog trainers have a motto that applies to engineers as well: “a tired dog is a happy dog!”)
This Can’t Last
Now, remember that friend I mentioned earlier? He congratulated me on the new position, but also gave me a warning. He said that my company was likely to grow and that I’d get to shape what sort of shop we evolve into. I suspect we’re going to follow the “Joel on Software” philosophy of hiring, where the only two choices about a potential hire are “Hell yes” or “No thanks.” In other words, we aim to hire very selectively.
At the same time, we will in fact grow. I wonder how far this tiny organic process can scale? When we have five developers will we still want to meet in the CEO’s (or my?) office every morning? Will we have to pick a start time for this “meeting”? Will we have to impose a branching strategy for our CMS? Will we have to start writing documents?
This question mirrors the scaling questions we face with programs and hardware. Scaling a database server (especially a traditional SQL server) “up” is simply a matter of getting a bigger box to run it on. This corresponds to hiring really good people and giving them a highly productive environment to work in. At some point, however, one reaches a limit and has to scale “out.” In traditional SQL servers this meant tackling clusters, masters, and slaves and all manner of expensive black arts. In modern databases such as Cassandra, this means just adding another node to the cluster (it’s not really quite that simple, of course, but for this analogy it’s close enough).
Just what the No-SQL-scale-out solution for growing a team is remains to be seen.
Cassandra can scale out because each node is able to do any of the units of work that are sent to the cluster. The cluster is eventually consistent, in that information supplied to one node is propagated to the other nodes. One can configure how many nodes need to have the data before is it considered “saved,” and there are standard protocols for discovering nodes (gossip) and for propagating data between nodes.
How if at all does this apply to teams? After all, in most large companies we fight a constant battle with management not to be considered fungible like some off-the-shelf server box that can be added to increase performance. And yet, perhaps the analogy works, because even off-the-shelf hardware isn’t totally fungible. Sure, just about anything with a CPU will run Java, but will it run well? There is a world of difference between a 64-bit Linux box with multiple fast hard drives and 16 gigs of memory and a stock user desktop machine running Vista. One of my current responsibilities is to design our Amazon EC2 system specifications for a cluster of machines running a collection of server products. Deciding how to balance machine size vs. number of machines, speed vs. cost, CPU vs. disk for a collection of collections is not a trivial task. The trick, of course, is that once you figure out the type of machine you need, you can go buy lots of them. Not so much with engineers.
(The word “fungible” is worth knowing the definition of: “able to replace or be replaced by another identical item; mutually interchangeable.”)
How to Hire
There is a whole industry built around the promise of helping companies find more developers just like “that guy.” Some companies grill candidates on algorithm theory, others do sets of white-board challenges.
An interesting trend is the rise of sites like LinkedIn, StackOverflow, and GitResume. Each is based on the premise that the “interview” for a job occurs in the years prior to applying for the job. If you are connected to lots of smart people, or have invested time in answering questions for others or have participated in group projects, then you pass. The shibboleth here is connection, and long-term connection at that. I think the long-term part is critical.
Prior to a Google interview, I was given the suggestion to re-read Sedgewick’s classic text on Algorithms. This was disappointing, as any test that you can “cram” for seems weak. Software has collaboration and communication at its heart, so selecting candidates for demonstrated skills in those areas seems like a better test than if they can code up a hash algorithm on demand.
I’ve seen teams succeed and I’ve seen teams fail; there are a lot of intangibles. I’ve seen teams with good spirit and so-so skills perform well but I’ve never seen a team with excellent technical skills and no social ones do anything but crater.
So I hope that as we grow we’ll look for a balance of skills. Fingers crossed!
Brian Tarbox is an engineer at Cabot Research in Boston working on financial services software. Prior to that he spent eight years as a Distinguished Member of Technical Staff at Motorola. He writes a blog on the intersection of software design, cognition, music, and creativity at briantarbox.blogspot.com and contributes to a variety of open source projects.