If you are trying to get ahead of the curve, you are going to be looking at a statistical or mathematical model of some kind. Models like these are ways of exploring the answers to questions you would never want to learn the answers to the hard way.
With the novel coronavirus, as with any contagion, models can range from straightforward to complex. The simplest possible projection for SARS-CoV-2 would use a classic model from 1927, known as Kermack and McKendrick (named for its developers). Given how infectious the virus seems to be, how many people would we expect it to infect if we did nothing at all to get in its way? If you do the fairly basic math, the answer comes back around 89 percent — more than 290 million Americans.
How many of them would suffer severe disease? We are still trying to figure that out. There are many statistical reasons that accurately estimating the percentage of infected people who will die is difficult. For example, if we observe only severe cases that show up at hospitals, mortality will seem higher than it really is; if we calculate this number too early, we won’t count some of the people who are yet to die, and we will underestimate overall mortality. So we may not know the true answer until later in the pandemic. The most hopeful projections are around 0.6 percent, meaning 6 people in 1,000 will die. But even under the wildly optimistic assumption that the mortality rate for covid-19 is similar to that of the flu, which kills about 1 in every 1,000 people it infects, that would mean more than 290,000 deaths in the United States alone. (In bad years, the seasonal flu kills around 50,000.)
Still, models are not destiny, and that simple model makes a gallon of faulty assumptions — including that human beings move around and contact one another randomly, like molecules in a gas, instead of making very nonrandom contacts within certain networks, as we actually do. You are much more likely to contact a specific group of folks in your local community, whether at your workplace, wherever you worship or in your own household. This doesn’t mean that the model is wrong; it just means that to capture things like physical distancing, you need to make it a bit more complicated.
Several teams are working to include some of these complications. One series of models at Imperial College London estimates how effective different amounts of physical distancing might be, which tells you how many lives you can expect to save with different forms of restrictions. Moderate distancing might involve simple changes like having professionals work from home, while more extreme interventions would include school closures. This model has had a major impact on the policies of the British government and the White House.
The Imperial College (IC) estimates of what would happen if we did no distancing at all are sobering, because they assume a higher mortality rate from symptomatic disease, which was estimated from data collected in Italy’s outbreak and may be fairly representative of what happens when a health-care system is overwhelmed. If the simple 290,000 death estimate from the most basic model was low, the IC model predicted a horrifying 2.2 million deaths in the United States if the coronavirus was left to spread without any mitigation. There are other similar models, including one from the University of Basel in Switzerland, that you can play with if you want to try putting your own numbers in.
The model that’s now getting the most attention in the United States — both from the news media and from the White House — is a very different sort of beast. With an interactive website and daily updated figures for individual states that estimate the time to the peak strain on health-care systems, the work by the Institute for Health Metrics and Evaluation (IHME) at the University of Washington has made an enormous impact. The IHME model makes drastically different estimates of the total burden of disease and death from the pandemic than the IC model does. And over time, the IHME model can produce radically varying outcomes: One day, it’s predicting a nor’easter heading out into the ocean, the next it says the storm will dump two feet of snow on Boston.
Why is there such variation? And more important, which is right?
The IHME model is not actually a model of infectious disease because it doesn’t include transmission among people. It just assumes that new infections build up and then fade away. Nor does it model the process of physical distancing that is assumed to make them fade away. It’s an extreme example of a simplifying assumption. In technical terms, it is a statistical model that fits deaths to a curve, while the IC model is a mechanistic one in which the numbers of cases increase and decrease for a reason, and the way they increase and decrease varies with the exact parameters the model is fed. The goal of the IHME model is only to predict the timing of the peak, the health-care resources that will be demanded at that time and the total deaths before August. The IC model accounts for a more dynamic process in which one person becoming infected has an effect on the noninfected people in the model — because a new infection can then infect them. In the IHME model, one person being infected makes no difference to anyone else; they just get added to the count.
The IHME model then works by taking the epidemic curves of previous outbreaks, in the form of numbers of deaths, and fitting those curves to the reported status of the pandemic in different states. But it also makes some odd assumptions. A caveat atop the website says clearly: “COVID-19 projections assuming full social distancing through May 2020.” Yet we are nowhere near “full social distancing.” In fact, an analysis of cellphone data has revealed that in some parts of the United States, people appear to be behaving as though we were not in the midst of the biggest public health crisis of our lifetimes.
This could mean the IHME estimates of deaths are too optimistic about the effects of physical distancing. (Asked for comment by The Washington Post, Theo Vos, a senior faculty member working on the IHME model, said the team uses more than 20 locations where daily death rates have peaked to inform the shape of the curve elsewhere. The IHME is “using all available data to estimate how many days after the implementation of social distancing measures we can expect the number of daily deaths to peak,” he said. “This is based on observed data in epidemics around the world — and is not making assumptions about the effectiveness of these measures.” The IHME is now incorporating phone location data that shows that social distancing started informally in many places before authorities ordered restrictions, Vos said: “Over the next couple of days, these findings will be incorporated into our models and will make our expected total estimates come down rather than increase. In other words, we may have been more pessimistic rather than too optimistic.”)
As Anthony S. Fauci, director of the National Institute of Allergy and Infectious Diseases, put it last month, “Models are as good as the assumptions you put into them. . . . As we get more data, then you put it in and that might change.”
The IHME model relies upon observations from individual states, and individual states vary hugely in their testing capacity and approach. To sidestep this, the model uses covid-19 deaths, on the grounds that these are more likely to be picked up accurately than infections are. But certainly not all deaths are being reported, and in a lot of places, the numbers are still too small to make decent predictions from them about what might happen when the virus spreads more rapidly there. In the early stages of a local outbreak, death counts can be heavily influenced by chance events, like the introduction of the virus to a nursing home — which can skew the projections.
That’s why the model’s estimates have been fluctuating wildly. For example, on April 8, the number of predicted deaths for Massachusetts was 5,625. By April 13, it had jumped to 8,219. It’s hard to base a detailed strategic plan on numbers like that, other than to heed the important message that we need to take this seriously and that a lot of people will die.
All this can lead to a dangerous nihilism around the results of modeling, one that will be familiar to people working on climate science: If models are so dependent on assumptions, surely they cannot be trusted. But this misses the point. If you walked into the middle of an interstate highway and got hit by a truck, depending on the exact data about the velocity of the truck (and you), physicists might come up with a range of estimates of where you would end up, and in how many pieces. While the exact predictions would be different, in one very important respect, the outcome of all of them would be the same.
Epidemiology can be complicated, but it can also be simple. Debating the merits of different models rather than taking prompt action to slow the relentless march of the coronavirus is a deadly mistake. We need no model at all to look at the grim data points of Wuhan, China; Italy; Spain; Britain; and now New York. Models will play a huge role in planning future strategies after we have emerged from this initial crisis, and we will have better data to inform them. But for the next few weeks, our duty is to stop more of those grim data points from developing.
This story was updated on April 17.