Behavioral Impact of Tracking Metrics: Loading vs. Unloading a Dishwasher

TL;DR
Don't look at a metric just at it's face value; dig deeper and question the behavior promoted by tracking that metric.

Metrics. The be-all and end-all of any IT ops team and something that most IT executive leaders track on a weekly, if not daily, basis.

I like that lately a lot more traditionally non-tech companies are focusing on metrics to increase efficiencies throughout the software life cycle. However, one thing I notice missing is an informed discussion on not only what metrics are important, but what metrics promote the desired behavior in the teams.

To Load or Unload, that is the Question!

To illustrate the importance of this point, lets look at a very simple regular life example of dishwashing.

Let's say you live with a roommate, and you need to divvy up the responsibility of dishwashing. A very reasonable thing to do is to alternate the responsibility between the two of you. Now the question is how do you track who did the dishes last. Most people will probably go by "Who loaded and ran the dishwasher last?". I argue that it is the wrong metric. Not because it does not track the effort properly, but it promotes the wrong behavior. The correct metric to track should be "Who unloaded the dishwasher last?".

Who loaded the dishwasher last?

First, lets look at the behavior it promotes:

  1. Once you have loaded and ran the dishwasher, you have no incentive to unload it.
  2. Since, you are not unloading it, usually you end up having clean utensils in the dishwasher and just pulling out what's necessary.
  3. This leads to the dirty dishes sitting in the kitchen sink till the dishwasher is organically emptied.
  4. Also, its in your self-interest to load and run the dishwasher before its completely full, so you effectively have to rinse and load less dishes.

So, the net effect is two-fold:

  1. Your kitchen sink is usually full of dirty dishes.
  2. You pay higher energy costs, since dishwasher is not always run completely full.

Who unloaded the dishwasher last?

Again, looking at the behavior this promotes:

  1. The dishwasher usually always have dirty dishes in it.
  2. Since the dishwasher is mostly on dirty, the dishes get rinsed and put in right away, instead of sitting in the kitchen sink.
  3. There is no incentive to run the dishwasher early, so it gets run when its needed.

As you can see out of the two metrics, if you only wanted to track one, tracking unloads promotes better behavior!

Tracking Software

Now lets take this example into the IT field. Couple of the most commonly tracked and cited metrics are:

  1. Number of Priority 1 (P1) tickets.
  2. Number of Failed Change Requests (CR).

P1s

Tracking number of P1s inherently is not bad, but when it becomes the primary factor in determining the stability and performance of a team, then its crosses a line into promoting bad behavior.

Here is what happens when the executive leadership of an IT department start monitoring P1s closely:

  1. The ops teams start down-grading P1s to P2s or even P3s.
  2. Issues reported by business users directly sometimes don't even make it to the official ticket tracking system and just resolved off-the-books.

Failed CRs

Tracking failed CRs basically sounds like a good metric, but in reality is probably worse than tracking P1s. Lets analyze what behavior we promote by doing this:

  1. The overall amount of paperwork done for CRs goes through the roof. Because if any CR fails, the "remediation" meeting will usually dissect the paperwork and not the true root cause of the failure, so everyone tries to cover their rear-ends.
  2. There is an unreasonable amount of testing that starts happening; for the exact same reason as above. IF this CR fails, you don't want to be the guy who didn't do enough testing of your change. So, even a simple change to reduce the amount of data retained by a purge process has to go through multiple rounds of testing by a separate QA team.
  3. The worst affect of all this is the delivery cycles just get longer, as all your dev and ops teams are worried about bureaucracy and distracted from the real things that matter, like taking calculated risks to deliver business value as fast as possible.

The Correct Metric: System Uptime

I would argue that tracking System Uptime, both planned as well as unplanned, is a far more useful metric than the number of P1s and failed CRs.

First, unplanned downtime is a good indicator of P1s that matter. Since each P1 is not created equal (a crashed load balancer that took your whole website offline, is a lot worse than a rogue service instance, causing 5% of your call-center agents not to be able to book a room on the first try). Tracking system downtime, gives you a more normalized view of the problem than the number of P1s.

Second, it takes the pressure off the devs to have the paperwork to the tee; it focuses them on making sure their CR will not cause any downtime. No matter whether the actual CR needs to be rolled back, as long as the dev took proper precautions to ensure no downtime, everything is good. It also encourages your team to build more resilient architectures, hot swappable service instance, rolling upgrades, automated deployments/rollbacks etc.

Conclusion

As I have put forth the case here, there could be devastating affects of tracking wrong metrics on your organization's speed and effectiveness.

Tracking the wrong metrics not just reduces the effectiveness of your data-driven decision making, it actively undermines the productivity and efficiency of your teams.

Think long and hard the next time you want to tie the performance bonus of your team to the reduction in number of P1s, unless you want your organization to become like the Japanese Police department[1][2].

1.http://articles.latimes.com/2007/nov/09/world/fg-autopsy9
2.https://www.vox.com/world/2015/12/13/9989250/japan-crime-conviction-rate

Manuj Bhatia

Manuj Bhatia