Metrics and MeritMetrics and Merit
Nov 7, 2021
Some thoughts on using metrics as a way to evaluate merit
Friends, likes, followers, grades, years of experience, SAT scores, retweets. All of these are metrics that the average American is likely fairly familiar with, and it’s little wonder why. Metrics pervade our lives in subtle, but never-ending ways. In our current political atmosphere we are constantly drilled with numbers: global temperatures, money spent, national debt figures, and an assorted array of percentages. There is a promise there: these numbers are too high or too low, and the goal is to reduce or raise them. When you shop online, or try to choose a book or a movie to pick up, it is likely that you will look at reviews to help inform your decision. This is no accident, simply a byproduct. Muller suggests in The Tyranny of Metrics that, “The quest for numerical metrics of accountability is particularly attractive in cultures marked by low social trust” (40). When engaging with the Internet, low-social trust is certainly an issue, and metrics are a very smooth way to bypass issues of trust in a digital environment. But metrics have pervaded more aspects of culture than mere purchases: athletes are measured, employees are measured, and companies use metrics to inform decisions. But in a society of metrics, the critical question is: are we using them appropriately? The answer is complicated, because it is highly contextual. When used well, they are incredible tools, but when used incorrectly, they can be extremely damaging to the fabric of society. The key to avoiding harm with metrics is perspective. A metric is useful so long as it is used as a means of communication, and not as a judgement or evaluation in and of itself.
Metrics are appropriate tools in the context of systems, games, or structures of any sort that have a desired end goal to achieve. In other words, metrics excel where there are inputs and outputs. However, that does not necessarily mean that metrics should be used with abandon in any such structure or system. Muller points out that metrics that are incorrectly used are often harmful. They could lead to systemic gaming or outright cheating if the wrong metrics are emphasized as goals, or if simple metrics are used to determine success in complex processes (Muller, 23 - 24). In other words, metrics fail in systems if they are used as forms of judgement. For example, should a company that produces x amount of tape dispensers choose to focus on individual employee production rates as a metric of success rather than simply the amount of tapes produced, or some other factor, they might find that implementing stringent production requirements for employees might lead to more low-quality dispensers and thus less satisfied customers. By using employee production rates as a judgement of that employee’s quality, they’ve subverted their own purpose.
However, when the right metrics are used in the right way (as a way to communicate attributes or performance), they can have an incredible impact on optimization. In the movie Moneyball, the Oakland A’s were able to turn around a miserable season because their Game Manager Billy Beane changed his perspective on the metrics being used in baseball. The strategy switched from trying to find generally talented players with looks and charisma to trying to find players that could get them the most amount of wins. This subtle shift made a world of difference for the Oakland A’s in the early 2000’s, and changed the face of baseball. The old standards of evaluating players were inherently judgement-based. Advisors considered players “good” or “not good” based on metrics that were incorrect, and wrongly used. Beane’s strategy didn’t make any judgments about the players, it only took into account their ability to get runs. Focusing on the runs was key to getting wins, not making judgements about the players.
A more tangible example of an inappropriate use of metrics is its utility in aesthetic judgement. Although there are special arguments to be made for ratings of books or films on review aggregator websites, generally speaking, the use of metrics to judge or evaluate a work of art or aesthetic experience is inappropriate. Richards argues in Principles of Literary Criticism, that there is a way to determine the quality of a work, and the degree to which the art manages those criteria is the determining factor of its quality. Richards suggests that an artwork is bad if its communication is “defective”, or if what it is communicating is of no value (185). These are good criteria, but attaching metrics to them gets messy. For example, should one try to impose a metric standard to the rating of a short story, one might attempt to break down the “communication” aspect into components such as plot, dialogue, atmosphere, prose, and character development. Attaching a numerical value to any of these aspects is well enough, but what of the worthiness of the communicated experience? That is trickier. Trickier still is the compromising of those aspects. Suppose you thought the story was well written, but you didn’t enjoy reading it at all: is ascribing a metric to that aesthetic experience valid? It may be attempted, but it will fail as a valuable judgement of the aesthetic experience. However, there is an argument to be made that ascribing a general numerical value to an aesthetic experience might work as a communication of enjoyment or the experience’s impact on the viewer. For that reason, review aggregator websites are not so outrageous — the reviews collected are merely useful communications of enjoyment, not absolute declarations of quality. They are helpful when deciding on a movie or novel — you can reference the communicated average enjoyment of other observers to decide on whether or not to engage with the work and make a more personal absolute judgement.
One of the supreme failures in the use of metrics is over-simplification. To say a movie is the best of all time because it has the highest review score on a website is simple and demeaning, but to use that metric of enjoyment as an assessment of appeal, as an aspect of its qualities, is more appropriate. The more nuanced perspective, and the most honest one, would be to say that it has a high rating because of x, y, and z reasons. The merit is in x, y, and z, not in the score that results from it. Ultimately, metrics are appropriate when used in the right context and as a means of communicating an attribute or a bit of information which can then be useful in making a judgement or evaluation. A metric is inappropriate when used as a judgement or evaluation in and of itself.