Using statistical analysis to estimate real-world effects can be tricky. Often we’re not measuring quite what we want to measure (consider survey questions on topics ranging from happiness to how much are you willing to pay to reduce some risk) or not directly measuring at all (for example, when researchers use state- or country-level averages to draw inferences about individual motivation or behavior).
For some questions that spur wide interest, the data just aren’t there to give a definitive answer. For example, various researchers have tried to estimate the deterrent effect of the death penalty, but as John Donohue and Justin Wolfers discussed in a paper several years ago, it’s basically impossible to get leverage on the question because the death penalty tends to used in tandem with other crime-fighting measures. Occasionally there have been “natural experiments” in which the death penalty suddenly becomes legal or illegal but there just don’t seem to be enough of these to learn anything that’s even close to conclusive.
One way we can get a handle on extreme claims is to look at the numbers.
For example, Phoebe Clarke and Ian Ayres recently claimed in various places (the sports site Deadspin, the Freakonomics blog and the Journal of Socio-Economics) that “sports participation [in high school] causes women to be less likely to be religious . . . more likely to have children . . . more likely to be single mothers.” This is all possible, but oddly enough the claim was made without any reference to individual-level data. And, to bring us to our point here, the advertised effects were huge: “a ten percentage-point increase in state-level female sports participation generates a five to six percentage-point rise in the rate of female secularism, a five percentage-point increase in the proportion of women who are mothers, and a six percentage-point rise in the proportion of mothers who, at the time that they are interviewed, are single mothers.”
As I wrote on our statistics blog, these effects are huge to start with — elasticities of 50 percent for things that have nothing (apparently) to do with the treatment — and are even larger when you consider that the outcomes are binary and, for example, sports participation can’t make you secular if you were already going to be secular anyway, it can’t cause you to have a child if you were already going to have a child anyway, etc. The effects that were claimed are basically mathematically impossible. The implication would be that there’s this enormous group of girls who (i) will have children if they do sports, and (ii) will not have children if they do not do sports. In this particular analysis, these estimated elasticities have to be driven by big differences between states that possibly have nothing to do with high school sports.
How could this happen? How could a quantitative analysis yield estimates that don’t make any sense? It’s the old correlation-causation story (illustrated in general terms here with an amusing set of graphs showing high correlations between variables such as “Number of people who drowned by falling into a swimming-pool” and “Number of films Nicolas Cage appeared in”). The example at hand was a little better than this in that they had data at the state rather than just the national level — but it wasn’t good enough, given what came out of the analysis.
So, no, I don’t see good evidence here “that sports participation causes women to be less religious, more likely to have children, and, if they do have children, more likely to be single mothers.”
My point here is not to pick on this particular study but to demonstrate how we can think quantitatively and apply common-sense reasoning to evaluate social-science claims. I’m not saying that this sort of “sanity check” will always work, but I do think quantitative thinking can move us forward. In this example, for instance, the authors reported estimates that were implausibly huge. But if their estimates had been more reasonable, say one-tenth or 0ne-hundredth as large, then it would have been clear how essentially impossible it would have been for them to estimate these using state-level aggregates. And this in turn could have pushed their research in a useful direction, moving away from the study of lucky correlations into a closer engagement with their research question.