Your Code as a Crime Scene

Use Forensic Techniques to Arrest Defects, Bottlenecks, and Bad Design in Your Programs

by Adam Tornhill

Jack the Ripper and legacy codebases have more in common than you’d think. Inspired by forensic psychology methods, this book teaches you strategies to predict the future of your codebase, assess refactoring direction, and understand how your team influences the design. With its unique blend of forensic psychology and code analysis, this book arms you with the strategies you need, no matter what programming language you use.

Check out the author’s TEDxTalk about the book here.

Printed in full color.

Out of print

Software is a living entity that’s constantly changing. To understand software systems, we need to know where they came from and how they evolved. By mining commit data and analyzing the history of your code, you can start fixes ahead of time to eliminate broken designs, maintenance issues, and team productivity bottlenecks.

In this book, you’ll learn forensic psychology techniques to successfully maintain your software. You’ll create a geographic profile from your commit data to find hotspots, and apply temporal coupling concepts to uncover hidden relationships between unrelated areas in your code. You’ll also measure the effectiveness of your code improvements. You’ll learn how to apply these techniques on projects both large and small. For small projects, you’ll get new insights into your design and how well the code fits your ideas. For large projects, you’ll identify the good and the fragile parts.

Large-scale development is also a social activity, and the team’s dynamics influence code quality. That’s why this book shows you how to uncover social biases when analyzing the evolution of your system. You’ll use commit messages as eyewitness accounts to what is really happening in your code. Finally, you’ll put it all together by tracking organizational problems in the code and finding out how to fix them. Come join the hunt for better code!

Q&A with Adam Tornhill, author of Your Code as a Crime Scene

How did you come up with the metaphor of the source code being a crime scene?

Well, I was in the middle of my psychology studies when I joined a course in forensics. At the same time, I was working full-time as a software developer fighting some scary large-scale legacy systems on a regular basis. The main challenge there is always to know which parts of the codebase really matter. Which parts of the code become productivity bottlenecks? Which parts are hard to maintain? Where will the bugs be?

As I got into forensics, I realized that crime investigators face similar open-ended, large-scale problems that we do. And modern forensic psychologists attack these problems with methods useful to us software developers too. I decided to explore this connection and find out how we can apply it to code.

What are some of the forensics concepts we will learn about in this book?

The eye-opener to me, and the technique we’ll use as a metaphor to reason about code, is geographical offender profiling. A geographical offender profile uses the spatial movement of criminals to identify their home bases. It works by calculating a probability surface and projecting it onto a real-word geography. So, I thought, what if we could do the same for software?

In our case the offender is code. So we learn techniques to identify patterns in the evolution of your code, how you’ve worked with it so far. That gives you the power to predict its future, to find the code that’s hard to evolve and prone to defects - our offenders!

It’s not only about complex code - complexity only matters if we need to deal with it. That’s why it’s important to identify the overlap between complicated code that we also have to work with often. It’s a simple technique that works surprisingly well in practice. Of course we’ll also support it with findings from empirical software research - what you learn is not just opinions but based on practices that have been shown to work on real-world projects.

Large-scale software development is also a social activity. That means it’s prone to the same social biases that we fall for in everyday life. So here we’ll look into some forensic cases gone wrong, learn from their mistakes, and apply our new knowledge to reason about teamwork, organizations, and software architectures.

I don’t have a background in psychology. Will I be able to follow along?

I’ve made sure to explain the concepts we meet. Psychology matters to us since our primary tool as developers isn’t the computer - it’s our brain

and psychology is about how we function. It’s about how we learn, solve problems, reason, and work with others. All these areas relate to our everyday development activities.

Tell me more about Code Maat.

The analysis techniques are based on version-control data. As such, you’ll learn to mine data from your source code repositories and find interesting patterns in the evolution of your code. Code Maat is just a tool to automate the boring parts of that process.

In fact, I open-sourced Code Maat as a quick-start to put the techniques you learn about in the book into practice. We’ll also use the source code of Code Maat for some case studies. The only reason for that is because it feels better to rip my own design decisions into shreds rather than criticizing the work of others where I don’t share the original context.

That said, we’ll investigate several other codebases as well so that we get a feel for how the different techniques complement each other. Out of all that, the tool itself is the least important part.

But wait, you are saying I don’t need to use Code Maat to work with this book. What other tools can I use instead?

I’m pretty sure that these techniques will become mainstream in a few years - the information we can mine from our source code repositories is just too useful to be ignored. When that happens, you’ll have several tools to chose from (both commercial and free).

But until that happens, I’d recommend that you tailor the tools to your specific needs. The algorithms aren’t that hard to implement and we cover them all in the book. In addition, it’s easy to build more elaborate tools on top of Code Maat. Code Maat generates CSV output that’s straightforward to post-process and visualize in any way you chose.

Finally, there are other good options. I know that Michael Feathers, who wrote the foreword to the book, has open-sourced the tool he uses to analyze Ruby code repositories. There’s also the Moose project, which provides an open platform to build your own custom analyses.

What You Need

You need Java 6 and Python 2.7 to run the accompanying analysis tools. You also need Git to follow along with the examples.

Resources

Errata, typos, suggestions

Releases:

P1.0 2015/03/24
B3.0 2015/03/03
B2.0 2015/01/12
B1.0 2014/12/02

Contents & Extracts

Introduction

Evolving Software
- Code as a Crime Scene
  - Meet the Problems of Scale
  - Get a Crash Course in Offender Profiling
  - Profiling the Ripper
  - Apply Geographical Offender Profiling to Code
  - Learn From the Spatial Movement of Programmers
  - Find Your Own Hotspots
- Creating an Offender Profile
  - Mining Evolutionary Data
  - Automated Mining With Code Maat
  - Add the Complexity Dimension
  - Merge Complexity and Effort
  - Limitations of the Hotspot Criteria
  - Use Hotspots as a Guide
  - Dig Deeper
- Analyze Hotspots in Large-Scale Systems
  - Analyze a Large Codebase
  - Visualize Hotspots
  - Explore the Visualization
  - Study the Distribution of Hotspots
  - Differentiate Between True Problems and False Positives
- Judge Hotspots with the Power of Names
  - Know the Cognitive Advantages of Good Names
  - Investigate a Hotspot by Its Name
  - Understand the Limitations of Heuristics
- Calculate Complexity Trends From Your Code’s Shape
  - Complexity by the Visual Shape of Programs
  - Learn About the Negative Space in Code
  - Analyze Complexity Trends in Hotspots
  - Evaluate the Growth Patterns
  - From Individual Hotspots to Architectures
Dissect your Architecture
- Treat Your Code as a Cooperative Witness
  - Know How Your Brain Deceives You
  - Learn the Modus Operandi of a Code Change
  - Use Temporal Coupling to Reduce Bias
  - Prepare to Analyze Temporal Coupling
- *Detect Architectural Decay
  - Support Your Re-Designs With Data
  - Analyze Temporal Coupling
  - Catch Architectural Decay
  - React to Structural Trends
  - Scale to System Architectures
- Build a Safety Net for Your Architecture
  - Know What’s in an Architecture
  - Analyze the Evolution on System Level
  - Differentiate Between the Level of Tests
  - Create a Safety Net for Your Automated Tests
  - Know the Costs of Automation Gone Wrong
- Use Beauty as a Guiding Principle
  - Learn Why Attractiveness Matters
  - Write Beautiful Code
  - Avoid Surprises in Your Architecture
  - Analyze Layered Architectures
  - Find Surprising Change Patterns
  - Expand Your Analyses
Master the Social Aspects of Code
- *Norms, Groups and False serial killers
  - Learn Why the Right People Don’t Speak Up
  - Understand Pluralistic Ignorance
  - Witness Groupthink in Action
  - Discover Your Team’s the Modus Operandi
  - Mine Organizational Metrics From Code
- Discover Organizational Metrics in Your Codebase
  - Let’s Work in the Communication Business
  - Find the Social Problems of Scale
  - Measure Temporal Coupling Over Organizational Boundaries
  - Evaluate Communication Costs
  - Take it Step-By-Step
- Build a Knowledge Map of Your System
  - Know Your Knowledge Distribution
  - Grow Your Mental Maps
  - Investigate Knowledge in the Scala Repository
  - Visualize Knowledge Loss
  - Get More Details With Code Churn
- Dive Deeper With Code Churn
  - Cure the Disease, Not the Symptoms
  - Discover Your Process Loss From Code
  - Investigate the Disposal Sites of Killers and Code
  - Predict Defects
  - Time to Move On
- Towards The Future
  - Let Your Questions Guide Your Analysis
  - Take Other Approaches
  - Let’s Look Into the Future
  - Write to Evolve
- Refactoring Hotspots
  - Refactor Guided by Names
- Bibliography
- Index

Author

Adam Tornhill combines degrees in engineering and psychology to get a different perspective on software. He works as an architect and programmer and also writes open-source software in a variety of programming languages. He’s the author of the popular book Lisp for the Web and has self-published a book on Patterns in C. Other interests include modern history, music, and martial arts.

About This Title

Your Code as a Crime Scene

Use Forensic Techniques to Arrest Defects, Bottlenecks, and Bad Design in Your Programs

by Adam Tornhill

Out of print

Q&A with Adam Tornhill, author of Your Code as a Crime Scene

What You Need

Resources

Contents & Extracts

Author

Out of print

Related Titles:

About This Title

About This Title

Your Code as a Crime Scene

Use Forensic Techniques to Arrest Defects, Bottlenecks, and Bad Design in Your Programs

by Adam Tornhill

Out of print

Q&A with Adam Tornhill, author of Your Code as a Crime Scene

What You Need

Resources

Contents & Extracts

Author

Out of print

Releases, Offers & More

Related categories:

Related Titles:

About This Title