Bill James, the noted writer and historian who began sharing his thoughts on baseball via “The Bill James Baseball Abstract” in 1977, has been one of the forefathers for the statistical study of the sport. So when he decides to weigh in on one of baseball’s MVP races, and the role advanced metrics such as wins above replacement (WAR) should or should not play in it, people like me take notice.

And this is particularly true when he writes that using the ever-popular WAR to illustrate New York Yankees right fielder Aaron Judge was more valuable than Houston Astros second baseman Jose Altuve is “nonsense.”

“Why?” James writes. “Because [Judge] didn’t do nearly as much to win games for his team as Altuve did. It is NOT close. The belief that it is close is fueled by bad statistical analysis.”

He continues his case by criticizing WAR’s characterization of the American League MVP race:

“Baseball-Reference WAR shows the little guy [Altuve] at 8.3, and the big guy [Judge] at 8.1. But in reality, they are nowhere near that close. I am not saying that WAR is a bad statistic or a useless statistic, but it is not a perfect statistic, and in this particular case it is just dead wrong. It is dead wrong because the creators of that statistic have severed the connection between performance statistics and wins, thus undermining their analysis.”

Why wage war on WAR? Because it has become the de facto way the league’s MVP races are decided.

For example, the National League MVP has finished first in FanGraphs version of WAR every year since 2012, with this year’s winner, Giancarlo Stanton tying with Anthony Rendon for the most fWAR in 2017. The American League’s track record isn’t as strong, but the eventual MVP has ranked no lower than third in fWAR for the season since 2013, with this year’s MVP, Jose Altuve, ranking third (7.5 fWAR) behind Aaron Judge (8.2) and Chris Sale (7.7).

The most famous debate over WAR’s role in the MVP race was in 2012. Miguel Cabrera (6.4 WAR, 5th overall) won the AL MVP vote on the heels of baseball’s first Triple Crown since 1967 despite being worth four fewer wins above replacement than Mike Trout (league-leading 10.3 WAR), who finished second in the voting. It wasn’t that leading the league in batting average, home runs and RBI was, in of itself, unimpressive to the WAR camp, but more a function that those stats are seen as anachronistic in the wake of baseball’s evolution toward advanced metrics.

Batting average looks solely at hits, and weights a single the same as a home run, despite the obvious benefits of the latter. Now we look at a player’s on-base and slugging percentage to get a better idea of how good a performance we are seeing. The RBI is an opportunity stat — the more the players ahead of you in the batting order get on base, the more likely you are to knock in a run. In 2012, Cabrera came to the plate with 444 runners on base over 333 appearances, the fourth-most appearances that year compared to just 306 runners in 214 appearances for Trout, which ranked 133rd. Give Trout the same batting situations as Cabrera in terms of men on base and he likely ends up with 24 more RBI. To those studying the new era of stats, there are better alternatives — alternatives James himself helped introduce.

But WAR is not beloved by James. It is worth noting this isn’t the first time James has come out against all-in-one metrics like WAR. In the 1986 “Baseball Abstract,” which debuted his player ratings, he cautioned against “great statistics” because “[t]he real baseball world is inevitably going to be hundreds of times more complicated than the model that we construct.” His problem using WAR this time around is again centered on how the metric is constructed, specifically regarding how it gives players credit for wins that are not there.

He illustrates this with the 2017 New York Yankees, a team that scored 858 runs and allowed 660, which would, based on James’s Pythagorean win expectation, project to a 100-win season. The Yankees, for a multitude of reasons, won just 91 games. However, because the WAR statistic uses runs created as its building block, New York’s players are being given credit for, and dividing among them, nine more wins than they should for the 2017 season. Houston, by comparison, scored 896 runs and allowed 700, giving them a final win-loss record that was two wins more than expected. If we discount each player’s WAR by the difference between their actual and expected win totals, the AL leader board looks a lot different.

Name Team Actual fWAR Adjusted fWAR
Jose Altuve Astros 7.5 7.7
Aaron Judge Yankees 8.2 7.3
Mike Trout Angels 6.9 7.3
Jose Ramirez Indians 6.6 6.3
Mookie Betts Red Sox 5.3 5.7

But even after adjusting WAR for actual wins, the calculations come with enough uncertainty that it’s difficult to produce a clear-cut most valuable player. [Note: That’s why I use other metrics, such as how many runs a player creates above or below league average (wRC+) and a player’s performance in context-neutral situations, when asked to write about postseason awards.]

James advocates using clutch performance as well — a hot topic of debate among baseball analysts for some time. When you look at how well Judge and Altuve have done with the game on the line in addition to their adjusted WAR it becomes clear the voters got it right when they named Altuve MVP.

In high-leverage situations
Name Actual fWAR Adjusted fWAR Walk rate Strikeout rate OPS wOBA wRC+
Jose Altuve 7.5 7.7 12% 10% 0.877 0.373 138
Aaron Judge 8.2 7.3 15% 25% 0.877 0.334 107

The Baseball Writers’ Association of America has steadily grown to embrace WAR in their award voting, and I doubt this will do anything to quell WAR’s popularity in that regard — after all, it is a very good metric in terms of giving a broad view of a player’s worth. Yet even it’s most staunch advocates, such as FanGraphs, which warehouses WAR and other advanced metrics used throughout baseball analysis, would advise you to put the number in context to get a complete, holistic view of a player’s overall value.

With his critique however, James, a figure revered by many in the advanced metrics camp, prompts an interesting discussion: If WAR, a stat created to paint a full picture of a player’s value, can’t tell us who the most valuable player is, what is it good for?

The beauty of WAR, for those who don’t want to bother with a host of detailed metrics to paint a more precise picture of a player, is that it appears to be an all-in-one assessment that makes player evaluation significantly easier. But if that’s not the case, and even FanGraphs acknowledges more context is needed, then that powerful simplicity erodes. If writers and analysts who want to use WAR as an absolute measurement of a player’s value are told that WAR is, in fact, not an absolute barometer of player value, what happens to the stat’s usage?

James’s work has always had great influence upon the metrics community. The result of his latest statement could have a profound impact on a stat that was becoming increasingly embraced both within and without that community.

Read more: