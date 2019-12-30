This was last updated Dec. 30, 2019.

Who will win the Iowa Democratic caucuses on Feb. 3? We can’t look into the future, but this simulator allows you to explore what sorts of outcomes could happen, depending on how some key factors change between now and the day votes are cast. The simulator is powered by a predictive model developed by Post Opinions data analyst and columnist David Byler.

The model

When you’re working through the simulator, what you’re really doing is running a statistical model. Specifically, this is a model I designed as a way to think about the likely range of outcomes in the 2020 Democratic primaries.

Statistical models sound complicated, but they’re not. Models are just a way of thinking about the world: Whenever you take in new information, use the past to make sense of it and then come to some conclusion, you’re using a mental model. If you look outside your window, see a bunch of dark storm clouds and grab your umbrella, then congratulations — you’ve already done it! You took in information (the storm clouds), used your experience (that storms usually happen when you see them) and came up with a conclusion (it might storm, so you’ll need an umbrella).

Statistical models do the same thing, but with math. When people such as me build models, we gather data from the past and make a series of decisions about how to organize that data and what conclusions to draw from that data. We then look at an unfolding scenario and use the data and conclusions we’ve drawn from it to try to think about what might happen in the future. In this case, the quantitative data we’ve used to build the model include polling averages, fundraising numbers, the amount of time left until Election Day, etc.

Models are designed and tested to ensure that the results they generate are as accurate as possible. Still, there is always the chance of a fluky result. This is why we run the simulated election 10,000 times: to get a sense of which outcomes are turn up most often so we can distinguish the likely events from the ones that are less likely, but still possible. This simulator is intended to let you explore how elections and models work, as you examine how different elements of the contest can affect the range of possible outcomes.

The model inputs

Polling

Polling is far and away the most important variable in this simulator. Polls represent a snapshot of public opinion at a specific moment in time — if a gaffe, a debate, a health scare for a candidate, a policy controversy or something else registers with the public, it should cause movement in the polls. Our simulator sees polls as the vital signs of the election.

We take a pretty simple approach to average the polls. We use time to weight the polls before averaging them: polls that were conducted recently are given more weight than old polls. As primary (or caucus) day approaches, this aggregate becomes increasingly myopic — that is, even more weight is given to recent data in an effort to catch on to last-minute surges.

Other (excellent) poll averages use somewhat different techniques: some smooth their trends more or attempt to ferret out the unintentional biases of pollsters and correct for them. But this relatively simple approach works for our purposes.

Fundraising

There are an enormous number of ways to measure fundraising — cash on hand, total dollars raised, dollars raised from small donors, burn rate and more. But we opted to keep it simple: We developed a statistic that we call “funds over average.” The basic idea is to measure who is (and isn’t) punching above their weight in donations from individuals.

The first step in calculating it is to find the percentage of all the contributions from individual donors that each candidate has raised. (In other words, Mike Bloomberg’s cash and Jeb Bush’s Super PAC don’t count.) Then we compare that share of the fundraising total to the average amount each candidate should have raised if all fundraising were distributed proportionally. For example: In a contest with 15 candidates, each candidate is one-fifteenth — or 6.7 percent — of the field. If Bernie Sanders has raised $20 million from individuals, and the rest of the field has raised a combined $20 million, Sanders has raised 50 percent of all individual contributions — a share far in excess of the 6.7 percent he would have if all candidates had raised a proportional share. If former vice president Joe Biden, meanwhile, has raised only 3.2 percent of individual contributions, he is well below his expected 6.7 percent threshold.

This measure doesn’t move the needle much in the simulator. It is possible that other ways of slicing and dicing money data would produce some measure that moves the model more. But, in general, our research found that good polling is a much better signal of a candidate’s strength than good fundraising.

The inputs we don’t use and why

There’s a lot this model doesn’t do.

It doesn’t predict the national popular vote; create delegate projections; examine state demographics; think about which candidates are more likely to take votes from each other; deal with candidates’ decisions to drop out of the race; or look at other factors such as endorsements.

In some of those cases, it’s because the model isn’t designed for it, either because our historical research suggested that these elements weren’t actually influential, or because we couldn’t find the data to build historical parallels. For example, we think positive media attention helps candidates in primaries, but the media industry has changed drastically over the last 40 years and we couldn’t find a satisfactory historical measure. Also, some factors such as good debate performances matter in the campaign, but we left them out because the polls capture their influence and we didn’t want to measure the same thing twice. And, in some cases, we decided that including those factors would require us to make judgments that we didn’t feel equipped to make — for example, when does a gaffe become big enough that it turns into a campaign event that needs to be accounted for in the model?

The mathematical methods employed in the model

The model that powers the simulator is empirical. Months before any user touched the simulator, I went back into historical data and examined the relationship between past candidates’ individual polling in individual states and fundraising numbers and their final share of the vote in that state. I used a statistical method called regression analysis to come up with mathematical relationships: equations which said that if a candidate had X percent in the polls and Y in fundraising, our best estimate is that they’d get Z percent of the vote.

But that projection isn’t the important part of this tool. The key part is the range of possible results.

After getting the best estimate of the final vote share, I used historical data to estimate how wide the spread — or range of possible results — should be. The basic idea here is simple: If a dozen candidates are running in a caucus state with crazy rules and it’s still a month before the election, the simulator should know there is a wide range of possible outcomes. If the simulator is looking at a more straightforward primary where two candidates are running and the election is tomorrow, we can be somewhat more confident about what’s going to happen.

I used a number of factors — including the number of polls have been conducted in recent weeks; when the last poll was conducted; whether the state is a primary or caucus; how much polls disagree on each candidate’s standing; each candidate’s polling average; how many candidates are in the race; and how far we are into the primary season — to estimate how big the spread of outcomes should be. From there, I simulate a mock election 10,000 times for each candidate and come up with a range of plausible vote shares for each candidate.

This process doesn’t have all the possible bells and whistles a full primary model could have, but it captures key dynamics of the race and is simple enough to be translated into an interactive simulator that users can easily play with.

The simulator

What you can do with the simulator

Play five candidates against each other

In this simulator, you’re allowed to adjust the polling and fundraising number of five candidates, and our polling trend line only measures the top five candidates.

There’s nothing special or magical about the number five. We mostly settled on it for design reasons. In the early states, there are a mess of candidates in the low single digits, and showing their polling average would create a too-busy visual. And on the input steps, we felt that forcing users to click through a dozen candidates, many of whom aren’t particularly well-known or are stuck in the single digits, seemed excessive.

If you’re a dedicated fan of Andrew Yang or one of the other lower-tier candidates, you can still select and play them. You just have to also consider the top four candidates at the same time.

Adjust polling

The simulator allows users to change where they think candidates will stand in the polls and see what the spread might look like on the day votes are cast if that happens. We don’t expect you to automatically know what amount of change in the polling would (or wouldn’t) be reasonable, so we give you buttons that serve as a guardrail.

The button values were also chosen using a somewhat unconventional method.

The model doesn’t explicitly break up error that occurs because of pre-election swings in the polls or primary day polling errors. So, to get a sense of how much readers should be able to move the polls, I use the model to project out two sets of simulations: final outcomes based on what we know today for a specific candidate (when there’s time for polls to move and the possibility of a polling error), and simulated outcomes on primary day (when polling error is still possible but polling movement is not). If we subtract these two sets of simulations, we’re left with estimates of how much we can expect polls to move between now and primary day.

And we used those estimates to help us set the ranges on the buttons. We used round numbers — if our estimates say a candidate could realistically move 16.8 points, we round down to 15 and give people some natural-sounding options. Moreover, we calculate these button ranges every day for candidates at different polling levels: A candidate at 2 percent in the polls on election eve is much less likely to move up or down than a candidate who is leading with 30 percent of the vote three weeks before primary day.

It’s important to note that these buttons and values are grounded in data and have passed some sanity checks. But they’re not the final word. If you calculate your average differently than we do, then you’ll come up with a different amount of expected drift. We’ve also experimented with other ways to estimate this range. We think this is a reasonable approach, but we don’t think we’ve closed the book on this discussion.

Adjust fundraising

On fundraising, we take a bit of a different approach. We don’t expect readers to be as familiar with fundraising numbers as they are with polling numbers, so we give you three options: worse, the same or better. If you choose “worse,” then the candidate’s projected fundraising haul in the current reporting period is cut in half. “Better” doubles their total. These are big swings, but they’re not unprecedented: South Bend Ind., Mayor Pete Buttigieg raised $25 million from individuals in the second quarter of this year after raising only a little more than $7 million in the first quarter, and former congressman Beto O’Rourke’s second-quarter fundraising was less than half his first-quarter haul. These swings are real, but their effect isn’t huge. Fundraising is a weak predictor of performance once polling is taken into account.

How the simulator relates to the model

After readers select options for fundraising and polling for each candidate, the model simulates what would happen on primary day if your scenario played out. On the final screen, we show a simulated spread of possible percentages of the vote the candidates win, with the most common outcomes shaded in darker colors near the center. This should remind readers that, even if you give your favorite candidate an extra 15 points in the polls and pull your least favorite candidate down by 10, surprises are still possible. In early primary states, where no candidate has a commanding lead, even our most creative readers will find it difficult to design a scenario where anyone scores a guaranteed victory.

The model also produces text-based evaluations of the reader inputs — that is, we tell the reader how likely their scenario is. These likelihood assessments are created using the same process that set the button intervals: we assessed how far polls tend to drift in our model and used that to create benchmarks around what amount of drift is or isn’t typical. If there’s a 50 percent chance the polls move more than the user says, we say their input is “totally realistic.” If there’s a 20 percent chance, we say their input is “very plausible.” An input is “uncommon” if there’s a 10 percent chance of a greater shift, and “unlikely” if that chance is 2 percent or less. We use a 95 percent quantile to give readers a sense of how much of a shift is typical.

The simulator never says that any scenario is impossible. Politics isn’t physics, and even the most outlandish scenarios are, in the strictest sense of the word, possible. More important, seemingly unlikely things do happen. In 2016, Sanders won the Michigan Democratic Primary after trailing Hillary Clinton in the polls by more than 20 points. If this primary stretches out through the Democratic convention in July, we should expect at least a couple of highly unlikely events to happen. So please don’t mentally substitute the word “impossible” for “unlikely” if you see it in the text on the results screen.

Credits and contacts

Model design by David Byler

Front-end development by E.J. Fox

Visual design by Sergio Peçanha and Chris Rukan

Illustration by Bewilder for The Washington Post

Edited and produced by Alyssa Rosenberg

For questions and comments about the simulator, please contact David at david.byler@washpost.com and Alyssa at alyssa.rosenberg@washpost.com

The authors wish to thank Lenny Bronner for his consultations on the mathematical underpinnings of the model; Simon Glenn-Gregg for his invaluable engineering support; Dan Guild, Joshua Clinton, Charles Franklin, Marty Cohen, Chris Hull, Andrew Smith, Jay Leve and the many other academics, pollsters and bloggers who helped make available the data necessary to build the model;. Meghan Kruger, Becca Clemons, Amanda Gustafson, Jeremy Bowers, the entire Washington Post engineering and product team for their support in translating this idea for public consumption; and Nick Diakopoulos and Jessica Hullman for their advice on data visualization.