Published: March 2015
Jack the Ripper and legacy codebases have more in common than you’d think. Inspired by forensic psychology methods, this book teaches you strategies to predict the future of your codebase, assess refactoring direction, and understand how your team influences the design. With its unique blend of forensic psychology and code analysis, this book arms you with the strategies you need, no matter what programming language you use.Check out the author’s TEDxTalk about the book here.
Printed in full color.
Software is a living entity that’s constantly changing. To understand software systems, we need to know where they came from and how they evolved. By mining commit data and analyzing the history of your code, you can start fixes ahead of time to eliminate broken designs, maintenance issues, and team productivity bottlenecks.
In this book, you’ll learn forensic psychology techniques to successfully maintain your software. You’ll create a geographic profile from your commit data to find hotspots, and apply temporal coupling concepts to uncover hidden relationships between unrelated areas in your code. You’ll also measure the effectiveness of your code improvements. You’ll learn how to apply these techniques on projects both large and small. For small projects, you’ll get new insights into your design and how well the code fits your ideas. For large projects, you’ll identify the good and the fragile parts.
Large-scale development is also a social activity, and the team’s dynamics influence code quality. That’s why this book shows you how to uncover social biases when analyzing the evolution of your system. You’ll use commit messages as eyewitness accounts to what is really happening in your code. Finally, you’ll put it all together by tracking organizational problems in the code and finding out how to fix them. Come join the hunt for better code!
How did you come up with the metaphor of the source code being a crime scene?
Well, I was in the middle of my psychology studies when I joined a course in forensics. At the same time, I was working full-time as a software developer fighting some scary large-scale legacy systems on a regular basis. The main challenge there is always to know which parts of the codebase really matter. Which parts of the code become productivity bottlenecks? Which parts are hard to maintain? Where will the bugs be?
As I got into forensics, I realized that crime investigators face similar open-ended, large-scale problems that we do. And modern forensic psychologists attack these problems with methods useful to us software developers too. I decided to explore this connection and find out how we can apply it to code.
What are some of the forensics concepts we will learn about in this book?
The eye-opener to me, and the technique we’ll use as a metaphor to reason about code, is geographical offender profiling. A geographical offender profile uses the spatial movement of criminals to identify their home bases. It works by calculating a probability surface and projecting it onto a real-word geography. So, I thought, what if we could do the same for software?
In our case the offender is code. So we learn techniques to identify patterns in the evolution of your code, how you’ve worked with it so far. That gives you the power to predict its future, to find the code that’s hard to evolve and prone to defects - our offenders!
It’s not only about complex code - complexity only matters if we need to deal with it. That’s why it’s important to identify the overlap between complicated code that we also have to work with often. It’s a simple technique that works surprisingly well in practice. Of course we’ll also support it with findings from empirical software research - what you learn is not just opinions but based on practices that have been shown to work on real-world projects.
Large-scale software development is also a social activity. That means it’s prone to the same social biases that we fall for in everyday life. So here we’ll look into some forensic cases gone wrong, learn from their mistakes, and apply our new knowledge to reason about teamwork, organizations, and software architectures.
I don’t have a background in psychology. Will I be able to follow along?
I’ve made sure to explain the concepts we meet. Psychology matters to us since our primary tool as developers isn’t the computer - it’s our brain
Tell me more about Code Maat.
The analysis techniques are based on version-control data. As such, you’ll learn to mine data from your source code repositories and find interesting patterns in the evolution of your code. Code Maat is just a tool to automate the boring parts of that process.
In fact, I open-sourced Code Maat as a quick-start to put the techniques you learn about in the book into practice. We’ll also use the source code of Code Maat for some case studies. The only reason for that is because it feels better to rip my own design decisions into shreds rather than criticizing the work of others where I don’t share the original context.
That said, we’ll investigate several other codebases as well so that we get a feel for how the different techniques complement each other. Out of all that, the tool itself is the least important part.
But wait, you are saying I don’t need to use Code Maat to work with this book. What other tools can I use instead?
I’m pretty sure that these techniques will become mainstream in a few years - the information we can mine from our source code repositories is just too useful to be ignored. When that happens, you’ll have several tools to chose from (both commercial and free).
But until that happens, I’d recommend that you tailor the tools to your specific needs. The algorithms aren’t that hard to implement and we cover them all in the book. In addition, it’s easy to build more elaborate tools on top of Code Maat. Code Maat generates CSV output that’s straightforward to post-process and visualize in any way you chose.
Finally, there are other good options. I know that Michael Feathers, who wrote the foreword to the book, has open-sourced the tool he uses to analyze Ruby code repositories. There’s also the Moose project, which provides an open platform to build your own custom analyses.
Published: March 2015