“In the United States we like to ‘rate’ a President. We measure him as ‘weak’ or ‘strong’ and call what we are measuring his ‘leadership.’ We do not wait until a man is dead; we rate him from the moment he takes office.”

So begins Richard Neustadt’s classic 1960 book Presidential Power. Fifty-odd years later, seeped as we are in the online instant analysis of the horse race, the top ten list, and a proliferation of tiers and hierarchies, his emphasis seems more apt than ever. Efforts to rate and order presidencies have become something of a cottage industry in their own right. These go back to the iterated surveys of presidential performance taken by Arthur Schlesinger Sr. and Jr. (in 1948, 1962, and 1996), and have been added to by others who sought to redress what they saw as the Schlesinger samples of respondents’ Democratic bias (as with Wall Street Journal and Federalist Society surveys, as well as Alvin Felzenberg’s more idiosyncratic take) and the wider recent efforts of C-SPAN.

The newest addition — a survey by political scientists Brandon Rottinghaus and Justin Vaughan, just published and highlighted on the Monkey Cage — is a terrific contribution to that industry. I will forgive their respondents’ poor opinion of Franklin Pierce (Bowdoin ‘24!); indeed I guess I’d better, as I was one of them.

There are of course a variety of useful caveats to be noted — see, for example, the comments of Jonathan Bernstein and Julia Azari, among others. Yet I can’t help wondering more broadly – if, as suggested above, slightly hypocritically – about the larger enterprise of the ranking exercise.

For one thing, even if we assume a fair sample of respondents (and the new survey draws on people who should know what they are talking about) a number of issues arise. Even smart scholars tend to both know more – and to be more politically judgmental — about recent presidencies than those farther afield. And it’s worth noting that presidential incumbents are, almost to a person, outliers – located on the far positive reaches of any scale measuring American political aptitude and skill. This creates a problematic bell curve when they are isolated into a single population. We might see the top and bottom as clearly differentiable, but is there any meaningful gap in performance between someone ranked #11 and someone ranked #21 in such a small set of observations?

Further, the rankings themselves change over time. (As Azari notes, “presidential history is American history.”)  Here, Harry Truman is the classic case: a president widely unloved by the electorate at the close of his term but one whose stock has risen steadily since. (It doesn’t hurt to have David McCullough write a book about you.)  Dwight Eisenhower, too, has seen his rankings rise over time – in his case, to match his extant public popularity—as a fuller internal record of his presidency became available. Neustadt, for one, downplayed Eisenhower’s executive skills, but early judgments can mislead, even within a single individual’s tenure in office. Indeed, the assessment of the George W. Bush legacy immediately after his 2004 reelection – as observers lauded the Republican majority realignment apparently achieved – was in sharp contradiction to the picture four years later. By contrast, in this survey at least Bill Clinton has wiped the scarlet “I” of impeachment off his copybook.

The deeper concerns with rankings, though, are inherent in the difficulty in (1) choosing the right standards for measurement and (2) in assigning credit or blame to isolated individuals in the separated system of American governance. As Donald Rumsfeld famously briefed, “Stuff happens” – but the fact of its happening during a president’s term doesn’t mean the president made it happen. (West Wing fans will recall why.) There is a wide range of governmental outputs, and outcomes, not all of them attributable to presidential action – even if he did in fact prefer that outcome. What outcomes has Barack Obama personally effected? Which of those should receive more weight in our retrospective assessment? Can we give credit for a good decision, even if it had a bad outcome? And is a decision or outcome “bad” if it conflicts with a later code of morality or with a latter-day judge’s ideological preferences? The institution of slavery and the treatment of Native Americans, for two, loom rather nastily over the earlier presidents.

Along these lines we must also recall that different presidents enter office under distinct political circumstances that expand or constrict the options available for presidential achievement. Bill Clinton moaned to his advisers that no one could be a great president without a national emergency, a thought channeled by Obama chief of staff Rahm Emanuel a decade later when he urged his boss to “never waste a crisis.”

How does all this get played into the mechanics of the rankings? The early surveys ranked presidents as “great,” “near-great,” etc., on the grounds that – like pornography – you know greatness when you see it.

C-SPAN, in its 2009 rankings, instead asked scholars to grade past presidents on no fewer than ten “attributes of leadership.” These cut across a wide range of areas encompassing not only skills and policy arenas but perceptions: public persuasion, crisis leadership, economic management, moral authority, international relations, administrative skills, relations with Congress, vision/agenda setting, “pursuing equal justice for all,” and broad “performance within context of the times.” This makes some sense in that it invites us to consider the multiple skills required for success in the job and the multiple dimensions to any presidency. (How do we feel about personal behavior, for instance, as opposed to the content of that person’s public policy?) Providing a series of categories allows presidential raters to make some distinctions along these lines, allowing us to praise Lyndon Johnson’s commitment to voting rights while decrying his decision to escalate in Vietnam.

Yet when translated into scores, those assessments across a long list of categories are compacted, equally weighted, into a single score—which assumes that each category is of equal importance, both across a presidency, and across time. Does “performance within the context of the times” have the same value as “administrative skills” or “relations with Congress” (indeed, does the former subsume the others in any case?) Do “economic management” and “moral authority” count simply as two equivalent questions on a multi-part exam? Should a bad television presence cancel out a good nuclear crisis?

What we want to know is the “value added” of having a specific person in the Oval Office, controlling for context. We can put the question in intuitive terms without casting too far back in history: comparing Gore v. Bush and the aftermath of 9/11, certain parts of history would surely have gone along similar lines, but others would likely have diverged significantly. The variance represents (some of) the Bush difference. Still, we’re stuck with counterfactuals, when slugging percentage and fielding range — or better yet, “wins above replacement” — would be much more satisfying things to know.

In short, presidential rankings are short in sabremetrics, and often in nuance.

And yet, as Neustadt noted, the exercise is both irresistible — and important (he ended the paragraph cited at the outset: “we are quite right to do so.”)  Exercises like the Rottinghaus and Vaughn survey help us think about what we value in leadership and in our history. Marc Landy and Sidney Milkis have a short and sweet assessment of greatness: whether the president transforms how Americans view their government, winning a “struggle for its constitutional soul.”

You know that, presumably, when you feel it.