The other day I received the following e-mail from a reporter:

A colleague is working on a quick piece on a soon-to-be-released paper which argues that female hurricane names have higher death tolls since people take them less seriously. In looking through the methodology, we’re seeing a couple of potential red flags, but could use a professional’s judgment on this. Any chance you’re available to take a quick look?

One thing that caught our eye was the whole no-male-hurricanes-before-the-late-1970s thing. Does that leave us with enough male hurricanes to say anything substantive here? Is their method of rating not just whether but how feminine the earlier names were good enough? I’m also curious, of course, just how much statistical tweaking they had to do to get those results. More broadly, a lab setting in which people rate names doesn’t do it for me – in the real world these sexist impulses, which I’m sure exist, are likely swamped by a million other factors. Surely whether or not your five neighbors evacuate has a bigger impact on your behavior than storm name.

Any thoughts on this that we could use would be hugely appreciated – we don’t want to fall for journalist-bait.

Anything to avoid working . . . so I took a look and responded as follows:

I agree with you that when comparing male and female-named hurricanes, it does not make sense to include data from the era in which all the hurricanes had girls’ names.

The paper indeed looks silly to me, although there’s possibly something there. They report that the scariest-sounding names are, in order, Omar, Bertha, Cristobal, Marco, Kyle, Arthur, and Laura/Fay/Dolly. This makes sense given the stereotypes we have (BIg Bertha, etc, and Omar of course is a scary middle-easterner, right?), but I don’t take it as meaning much, partly because these judgments are taken out of context.

I don’t see the point of figure 1. I’d rather just see a graph of the data (again, just the data since both sexes have been used in the names).

Also, if you look at their archival study, you’ll see that their coefficient was not statistically significant! That doesn’t mean the effect isn’t there, but it does mean that their sample sizes are low, and when you’re talking about hurricane deaths, you don’t have the data to say much more conclusive than that.

My quick summary is that I’m not convinced. But, who knows, maybe they’re right and the public heath service should start naming all the hurricanes Omar and Bertha and forget about calling them Kitty and Irving.

It might be that their hypothesis is true—it makes some sense—I just think the male/female thing is probably not the most important issue. Larger questions along the same lines would be the Omar/Irving distinction or, maybe more to the point, whether it makes it too cuddly to give these storms names at all. And what about ramping up the scale, so that a category 5 storm becomes a category 10, etc?

The study has received lots of attention. Here’s a largely credulous report from the Economist, and here’s something by S. E. Cupp on the CNN site that gives a much saner take on the matter. My own quote ended up here.

Meanwhile I heard on this study from a colleague who is interested in risk perception. My colleague wrote:

It will take me a while to make sense of the model (in supp materials) but it seems odd that they used a continuous variable for “masculinity-femininty” of names (in observational study). I’m guessing they did b/c male names weren’t added until later in the period & they would have been underpowered for straight-up male v. female. But if the effect is genuinely continuous — so that the impact of the index is same across female- & male-named hurricains–then presumably at least the abstract & title are misleading; it’s not gender of name by “gender-sounding,” so that more feminine femaile kill more than less feminine, and more male “male” more than “feminine” male. Wonder too if female names were indeed invariably rated as more feminine than male by the raters; if not, that would be weird.

As external validity– if those goes back to 1950’s, why should we believe contemporary index ratios would correspond w/ impressions of the individuals whose behavior was contemporarneous w/ the hurricaines? I’m sure all questions answered — in supplementary information etc (I don’t like PNAS’s, Science’s & Nature’s implicit treatment of methods as “just detail” — the effect of shunting them to back of paper & then ultimately out of the “printed” version of magazine, which in PNAS’s case is not even printed in all cases).

I’ve now read several blogs that berate the study, but none of them presents a particularly meaningful criticism (just lots of indignation & ridicule). The critical “expert” quoted in Ed Yong’s blog definitely hadn’t read the paper or didn’t understand the analyses.

The ridicule of paper by people who can’t be bothered to figure out what’s wrong (or probably in even more cases not capable of doing that) is at least as interesting as any problem in the paper!

On non-significant: Sure about that? I though the effect of MFI interacted with measures of storm intensity such that it became significant, statistically as well as practically as storm intensity increased. In Model 4 (where the predictors were standardized & hence centered on zero– they don’t seem to get that that’s presumably why some reviewer asked them to do that, since they observe, nonsensically, that standardizing didn’t change the fit-statistics!), the coefficient for MFI is nonsignificant, but that means only that MFI had a nonsignificant effect in storms of mean intensity; the positive coefficients for the cross-product interactions variables are both statistically signficant & have signs indicating that MFI increased deaths as storm intensity increases (& so presumably has an even smaller effect — maybe a negative one? — for storms below mean intensity).

Pre-79. I’m not sure it made “no sense” to use MFI for those. I would say just that their doing so means the effect they are measuring is not “male” vs. “female” names; it’s something about the connotations of names that applies to both female & male ones.

On scary names vs. gender effect: I have to look more closely. If the effect of the influence of MFI is *same* for female and male storms *&* gender as a dichotomous measure has no effect (even a “nonsignficant” one that is more than trivially different from zero & has right sign; I’m sure they used a continuous measure so they’d be able to increase power), then they really should consider a more nuanced account of the mechanism. “Masculine sounding” is okay. But presumably it *is* something like how scary or hardass the name sounds. That would probably correlate with gender but be more subtle: Bertha can kick Adrian’s ass, for sure. I’m sure that is a gender stereotyped thing to say — that effect would still involve gender — but if this is what’s going on, it’s more accurate, more interesting, & more plausible to think “tough” names would influence people’s expectations than simply that gender of name would

I have’t read closely enough– did they have the names rated in terms of “scariness” as well masculinity? I’d like to know too if any “male” names were rated more “feminine” than “female” names; if so, that would reinforce that MFI isn’t properly understood as “masculinity.”

I replied:

I’m getting tired or responding to random studies such as the hurricane thing. I have no objection to them finding a pattern and doing some psych experiments, but they’re way overselling their results also they’re oversimplifying their results by focusing on gender rather than Bertha etc.

I wouldn’t “berate” the study but I’d berate the naivite by which a large difference in a small sample is taken to represent a large difference in the real world.

The cleanest discussion I’ve seen of this is by Jeremy Freese, here. Freese makes a bunch of detailed points and then gets to this:

The authors have issued a statement that argues against some criticisms of their study that others have offered. These are irrelevant to the above observations, as I [Freese] am taking everything about the measurement and model specification at their word–my starting point is the model that fully replicates the analyses that they themselves published.

A qualification is that one of their comments is that they deny they are making any claims about the importance of other factors that kill people in hurricanes. But they are. If you claim that 27 out of the 42 deaths in Hurricane Eloise would have been prevented if it was named Hurricane Charley, that is indeed a claim that diminishes the potential importance of other causes of deaths in that hurricane.

And then he raises an important general issue in science communication:

The authors’ university issued a press release with a dramatic presentation of results. The release includes quotes from authors and a photo, as well as a quote from a prominent social psychologist calling the study “proof positive.” So this isn’t something that the media just stumbled across and made viral. My view is that when researchers actively seek media attention for dramatic claims about real deaths, they make their work available for especial scrutiny by others.

16. As a coda that may or may not be relevant to the case at hand, I will confess that I [Freese] have become especially impatient by the two-step in which a breathless set of claims about findings is provided in a press release, but then the authors backtrack when talking to other scientists about how of course this is just one study and of course more work needs to be done. In particular, I have lost patience with the idea the media are to blame for extreme presentations of scientists’ work, when extreme presentations of the scientists’ work are distributed to the media by the scientists’ employers [emphasis in the original].

As the saying goes, +1. The news media are what gets us hearing about these studies (and indeed I’m contributing to it now), but the tabloid science journals such as PNAS provide incentives for researchers to engage in hype so as to get their papers published, and of course once a paper is published, with whatever errors it happens to contain, researchers have an understandable tendency to hang tough and not acknowledge problems with their claims. The underlying statistical issues are tricky, so when researchers don’t see a problem with their work, part of it can be simple misunderstanding of some subtle statistical principles which have only recently been studied carefully in some ways.