If you ever find yourself on Wheel of Fortune and make it all the way to the final round, choose the letters G, H, P and O. That's the big takeaway from an analysis of the final bonus puzzles that appeared on all 1,546 Wheel of Fortune episodes between 2007 and today.
A quick refresher: on Wheel of Fortune, the contestant who's won the most money during regular play gets a shot at the bonus puzzle at the end of the show. In the bonus puzzle, you're given a category ("Thing," "Phrase," etc.) and the number of letters in the word(s) you're trying to guess. The letters R, S, T, L, N and E are gimmes -- Vanna White flips these ones over for you automatically. You then get to choose three more consonants plus one vowel -- those are revealed as well, and then you have 10 seconds to solve the puzzle.
This video of a typical bonus puzzle should suffice for the three of you who've never watched the show.
Everyone up to speed? Good! Now, if you were to take every bonus puzzle solution between 2007 and today -- all 1,546 of them -- and dump every single letter from every single puzzle out on a table, the distribution of letters would look like this:
You might expect that vowels are the most frequently appearing letters, but you may be surprised at which vowels appear the most. Nearly one out of every ten bonus puzzle letters is an O. I's aren't far behind. The six gimme letters also appear high up in the list.
Now, if you're a contestant, you need to choose three of the remaining consonants (green) plus one remaining vowel (purple). Looking at this chart, you can see that you'd get your best odds choosing H, G, P and O. But! If you've ever watched the show, you know that contestants rarely make this selection. Instead, they most frequently choose the letters C, D, M and A.
Roughly 65 percent of contestants choose C, even though five other consonants appear more frequently. M is an even worse selection -- nearly 60 percent of people pick it, but it's near the bottom of the pack of bonus puzzle letter frequency. For an apples-to-apples comparison, here's a chart of the percent of contestants who choose each letter versus the percent likelihood of the letters appearing in a bonus puzzle answer.
I've colored the four most frequently chosen letters so you can see how they stack up. Nearly 60 percent of contestants pick M, even though it shows up in an answer only 20 percent of the time. C and D are similarly not great choices. On the other hand a little over 30 percent of contestants pick O, but it appears in nearly 70 percent of the answers!
Why the discrepancy? People are choosing letters based on their overall frequency in the English language. A is the most common vowel other than the gimme E, and C, D and M are among the most common consonants.
But the show's producers know that people are guessing C, D, M and A, so they're going to choose puzzles without those letters when they can. And since they give you R, S, T, L, N and E right off the bat, they're also going to make sure those letters are under-represented in the puzzles. To illustrate this, here's a scatterplot showing total English language letter frequency versus bonus puzzle letter frequency. Letters in the green area of the chart are over-represented in bonus puzzles relative to the English language overall, while letters in the purple area are under-represented. You'll notice that R,S,T,L,N,E,M,D and A are all in the purple area.
So: going back to the very first chart, you can see that you get your best odds if you choose G,H,P and O. We can also quantify exactly how much better of a selection these letters are over the standard CDMA picks. To do that, for each of the 1,546 bonus puzzles in the database I calculated how many letters you'd get right with GHPO, and how many letters you'd get with CDMA. On average, the GHPO selection yields about 2.46 revealed letters versus CDMA's 1.85. This doesn't seem like a huge difference, but let's look at the distribution.
In the chart below, I've plotted how many times you get 0 letters, 1 letter, 2 letters, etc., all the way up to 8, when choosing GHPO and CDMA.
So for instance, a selection of CDMA yields 0 revealed letters in 203 of the 1,546 puzzles. Which is terrible! This means the standard selection doesn't help you at all 13 percent of the time. By contrast, with GHPO you'd get 0 revealed letters in only 115 out of the 1,546 puzzles. Or, look at it from the other end: a GHPO selection will reveal four or more letters in 341 out of the 1,546 puzzles. But a CDMA selection gets you four or more letters in just 158 of those puzzles.
Overall, you can see that the GHPO distribution favors the right side of the chart -- more revealed letters -- considerably more than the CDMA distribution.
So in other words, by choosing GHPO over CDMA you're twice as likely to get four or more letters revealed, and half as likely to get nothing. Why would you choose anything else? That is, at least until the show's producers catch on...
A note on the data
This post was inspired by a weekend reddit thread in which redditor "PhaethonPrime" mined data from wheeloffortunesolutions.com. But there are some issues with this dataset: because of the irregular structure of the site you can only get answers back to 2011. And the site includes listings for reruns that run on Saturdays and over the summer, so many of the bonus puzzles appear multiple times, which will obviously throw your counts off.
Instead, I grabbed puzzles from the Wheel of Fortune Bonus Puzzle Compendium, a mysterious and frankly sketchy Angelfire site. However, their puzzles are up to date and appear to be updated daily, they don't include reruns, and they're complete all the way back to 2007. I spot-checked these puzzles against the ones listed in wheeloffortunesolutions.com and found that they matched, giving me a fair amount of confidence in the numbers.