The problem of overfitting elections, in one brilliant cartoon

October 18, 2012

Nate SIlver's "The Signal and the Noise" (recommended, buy it, read it, etc.) includes a great discussion of the problem of "overfitting": When you see apparent relationships in data that aren't really there. His example:

A once-famous “leading indicator” of economic performance, for instance, was the winner of the Super Bowl. From Super Bowl I in 1967 through Super Bowl XXXI in 1997, the stock market gained an average of 14 percent for the rest of the year when a team from the original National Football League (NFL) won the game. But it fell by almost 10 percent when a team from the original American Football League (AFL) won instead. Through 1997, this indicator had correctly “predicted” the direction of the stock market in twenty-eight of thirty-one years. A standard test of statistical significance, if taken literally, would have implied that there was only about a 1-in-4,700,000 possibility that the relationship had emerged from chance alone.

There was, of course, no actual stock market bias against the AFL, as the next few Superbowls showed.

Overfitting is a particular problem during elections, when there's always some spurious attempt to explain the election through something totally unrelated like, well, football. For instance: If the Redskins won their last home game before the election, the incumbent party will hold the White House. This has held true in 16 of the last 17 elections (2004 was the exception). It's also ridiculous example of overfitting.

But perhaps you don't want to read a long (though very good!) book on common statistical mistakes. Perhaps you prefer cartoons. In that case, the brilliant XKCD has you covered:

