Week 2: Metric Design

Jenny’s first week since  the hospital visit had been tough. She’d been collecting data from around the farm for a few years, just as a hobby. She loved to check in every once in a while about the average temperature readings, milk prices, and anything else she could turn into a number. Sometimes the results would surprise her, and even allow her to fix an issue neither she nor John would have noticed otherwise. But it had always been an exploratory thing, like throwing some paint on a canvas and seeing if it looked interesting. But now everything needed to drive towards a single goal: $1000 per week. It was like standing there with the nude model in front of you, a disorganized box of art supplies at your side, and absolutely no idea what to do.

Designing Metrics

A dashboard can’t simply show a stakeholder a rapidly scrolling list data points, like the temperature readings that we peeked at in the first class. One of the first decisions that a dashboard designer needs to make is how to roll the operational data into metrics, aggregate numbers that represent a critical business concept. John and Jenny already have one metric, the amount of money  bank account. It represents the rolled-up sum of all of the income and costs that they incurred up until that point. That’s their goal metric, but it gives no insight into what aspects of their operation they could be improving. They need intermediate metrics, such as the amount of milk their cows have produced. They also need the ability to monitor inputs to the system which they do not control, but which they may want to react to, such as the price of feed.

There can be a surprising number of complexities in metric design, so we’ll walk through some of the common issues that you may face in this coming class session:

Lagged Data

When batch of milk is made, it needs to have a sample sent out for sanitary tests, which takes anywhere from 5 to 10 days. It would make sense to keep a metric of what percentage of the milk fails these tests. But if they start receiving failure warnings today, it’s not because of a change they made yesterday. Instead, the spike needs to be attributed to the day the batch was produced. However, the testing lag is variable, so there will be a period where they only have some of their results. If, on the morning of 8/6/2015, they printed out a graph that goes to 8/1/2015, they may find that the day’s data point moves for that date when they refresh the graph the next day, all the way to 8/11/2015.

Retailers that promise lifetime warranties can never say with certainty what their final 2014 refund rate will be. It will rise rapidly throughout the year, but continue to rise in 2015, and probably inch upwards in 2016. These ‘unstable metrics’ can be a headache to deal with, but are the only accurate representation of many business processes with latency.

Metrics with “Whales”

The farm sells about 10% of its milk directly to a local maker of artisanal eggnog, who pays 13x as much per kg as the other purchasers. When they place a big order before Christmas, it makes the “Average price per kg sold” metric explode upwards. That’s accurate in some sense, but it could be misinterpreted to mean that they’re getting better prices across the board.

Imagine you’re sitting on a small prop airplane as passengers board. Unfortunately, a college football player sits down next to you. You’ll be annoyed, but not worried about the plane’s ability to withstand the additional weight, probably 50%-75% more than the average American man.

Now imagine that everyone started carrying around their wealth in gold. As a middle-income American, you’re sitting there with approximately $97k of gold, confident that the plane is sturdy enough for passengers carrying up to 175% of that. Unfortunately, a pair of upper-income passengers get on, dragging their $639k of gold, each more than 6 times heavier than expected. But upper-income is roughly the top 20% of Americans. If even a single passenger from the 1% arrived, they would be bringing (at minimum) $8.4 million of gold onto the plane, and single-handedly doubling its total weight.

We tend to think of most metrics as being like height or body weight, which rarely vary by more than a fraction of the mean. Some metrics, however, can vary by so much that they are essentially measuring movement of a single outlier. Unfortunately for John, healthcare spending is similar, with 5% of Americans accounting for nearly half of the national cost. If he doesn’t manage his condition, his expenses could easily grow to dwarf every other aspect of the farm’s budget.

Metrics with near-zero values

The primary cost of producing milk is feed for the cows. A “kg of milk per kg of feed” metric makes sense, but if they had tried to plot the cows in terms of efficiency, they would reliably see Agnes and Beatrice alternating places between far top (by 10x the average) and far bottom (often at 0). They are old and sickly, so they eat (and produce) far less than the average cow. Since their denominator is so low, the natural variance can be a huge percentage of the mean. For example, if they normally eat only 6kg a day, but swing by 5kg in either direction, the denominator can vary be 11x from its lowest to highest point. Additionally, since feed isn’t turned immediately into milk, it’s quite possible for them to produce more kg of milk than they ate in feed in a small measurement period. There will also be plenty where they produce no milk at all.

Proxy Metrics

It would be great to measure exactly what the cows eat, but sampling each of their guts wouldn’t be feasible. It’s far easier to record what fields they are grazing in, and take a random sample of the grass for nutritional analysis. Combining knowledge of the field’s vegetation density, nutritional content, and grazing time, one could assign each cow a metric whose units would be in (protein-kg per field-meter squared)*grazing-minutes per month. It’s a very abstract number, and it will be hard to intuit what number represents “good enough”, but it’s better than having no insight at all.

Another type of proxy metric is a “sentinel metric”, used to put a number on a broader, more qualitative attribute. The health of a herd can’t be boiled down to a single number, but is obviously important. Several sentinel metrics might be employed, such as “% of herd with internal temperatures above 100F” and “Average weight of cows”. Neither of these totally capture the complex picture, and either one could move for reasons besides health, but they’re again the best indicators that can easily be measured.



Our Next Steps

In class, we’re going to look at more of the data tables that Jenny has been collecting, and use them to design metrics that they can use to identify areas for improvement around the farm.