Sunday, December 30, 2012

Make it Easy

One of my overriding principles is: make it easy for people to do the right thing.

This seems like it should be a no-brainer, but it was not always obvious to me. Early in my career I was a bit of a self-appointed build cop. The team I worked on was an adopter of some of the agile/extreme programming principles, and the result of that was a 40+ person team all working against the same code base, which was deployed weekly for 3 distinct business purposes. All development was done against trunk, using feature flags. We managed to do this through the heavy use of automated unit/integration testing; to check code in, you were expected to write tests of course, and to run the entire test suite to successful completion before checking in.

Unsurprisingly, people did this only to a certain level of compliance. It drove me crazy when people broke the build, especially in a way that indicated they had not bothered to run tests before they checked in. So I became the person that would nag them about it, call them out for breaking things, and generally intimidate my way into good behavior. Needless to say, that only worked so well. People were not malicious, but the tests took a LONG time to run (upwards of 4 hours at the worst), and on the older desktops you couldn't even get much work done while the test suite ran. In the 4 hours that someone was running tests another person might have checked in a conflicting change that caused errors; was the first person really supposed to re-merge and run tests for another 4 hours to make sure things were clean? It was an unsustainable situation. All my intimidation and bullying wasn't going to cause perfect compliance.

Even ignoring people breaking the build, this was an issue we needed to tackle. And so we did, taking several months improve the overall runtime and make things easier. We teased out test suites into specific ones for the distinct business purposes combined with a core test suite. We made it so that developers could run the build on distributed hardware from their local machine. We figured out how to run certain tests in parallel, and moved database-dependent tests into in-memory databases. The test run time went way down, and even better, folks could kick off the tests remotely and continue to work on their machine, so there was much less reason to try and sneak in an untested change. And lo and behold, compliance went way up. All the sudden my build cop duties were rarely required, and the whole team was more likely to take on that job rather than leaving it to me.

Make it easy goes up and down the stack, far beyond process improvements. I occasionally find myself at odds with folks that see the purity of implementing certain standards and ignore the fact that those standards, taken to extreme, make it harder for people to do the right thing. One example is REST standards. You can use the http verbs to modify the meanings of your endpoints and make them do different things, and from a computer-brain perspective, this is totally reasonable. But this can be very bad when you must add the human brain perspective to the mix. Recently an engineer proposed that we change some endpoints from being called /sysinfo (which would return OK or DEAD depending on whether a service was accepting requests), and /drain (which would switch the /sysinfo endpoint to always return DEAD), into one endpoint. That endpoint would be /sys/drain. When called with GET, it would return OK or DEAD. When called with PUT, it would act as the old drain.

To me, this is a great example of making something hard. I don't see the http verb, I see the name of the endpoint, and I see the potential for human error. If I'm looking for the status-giving endpoint, I would never guess that it would be the one called "drain", and I would certainly not risk trying to call it to find out. Even knowing what it does, I see myself accidentally calling the endpoint with GET, now I didn't drain my service before restarting it. Or I accidentally called it with PUT and now it's been taken out of the load balancer. To a computer brain, GET and PUT are very different, and hard to screw up, but when I'm typing a curl or using postman to call an endpoint, it's very easy for me as a human to make a mistake. In this case, we're not making it easy for people using the endpoints to do the right thing, we're making it easy for them to be confused, or worse, to act in error. And to what benefit? REST purity? Any quest for purity that ignores human readability does so at its peril.

All this doesn't mean I want to give everyone safety scissors. I generally prefer to use frameworks that force me and my team to do more implementation work rather than making it trivially easy. I want to make the "easy" path the one that forces folks to understand the implementation to a certain level of depth, and encourages using only the tools necessary for the job. This makes better developers of my whole team, and makes debugging production problems more science than magic, not to mention the advantage it gives you when designing for scale and general future-proofing.

Many great engineers are tripped up by human nature, when there's really no need to be. Look at your critical human-involving processes and think: am I making it easy for people to do the right thing here? Can I make it even easier? It might take more work up front on your part, or even more verbosity in your code, but it's worth it in the long run.

1 comment:

  1. Great post and insight. Reminds me of Hernando De Soto's "The Mystery of Capital". Same concept, different application.