The game Wordle is catching the world by storm. As a biostatistician I wanted to jump in and provide an analysis different from what you'll read elsewhere.
Specifically, I will optimize three different strategies to the game - strategies you may or may not currently use.
Although I prefer strategy three, there are merits to both of them depending on the type of player you are and your knowledge of the English language.
For starters, here are the general rules:
Unlike other analyses online, I wanted to make this specific to Wordle. I am using the December 2021 edition of Collins Scrabble Words from the UK. Furthermore, I am limiting my analysis to the 12,915 words which contain 5 letters. Most other analyses looks at a breakdown of every word in the English language, but words containing 5 letters are truly different from others.
The most common letter is S which appears in 46% of all five-letter words while Q is the least common and appears in less than 1% of all words. Rounding out the top five are the letters E, A, R, and O.
1. "TARGETTING YELLOW SQUARES" Strategy
When you target yellow squares, you are simply hoping for any letter(s) found in the final word. You don't care about the order. Perhaps you are great at Scrabble or played Jumble growing up.
In order to target yellow squares, you're best off making your first word one that contains the five most common letters: S, E, A, R, and O.
In this case the word AROSE would be the best possible guess. In the 12,915 words, 64,575 letters are used. The letters in AROSE account for 43% of all possibilities. In other words, on average, you'll get two letters correct after this first guess. They could be yellow or green.
Although this looks great, there are still 408 words that end with E and have an O somewhere else in the word. This goes down to 188 when you take away A, R, and S.
Another approach is to knock out the 10 most common letters but do so with zero overlap. In this case, you can start with AROSE and then move on to UNTIL. You've now accounted for 68% of all possibilities. This will yield you, on average, just over 3 letters.
In the example above, there are now 40 possibilities - down considerably from 400+ when you only had "arose". And only 21 if you eliminate the grey letters.
You could also try for 15 letters, but doing so with no overlap means you can't use the same words you did above. Now, the best approach is to use the words PLAIN, COMER, and DUSTY (in that order, according to how common they are). You're now up to 84% of all possibilities and you have only used 50% of your guesses.
In this case, the only possible word is "issue."
Finally, you could target the 20 most common letters but at the expense of an extra guess. To ensure no overlap go with MALTY, PRONG, CUBED, and WHISK (in that order). You've now covered 95.5% of all letter possibilities but only have two guesses left.
Using this strategy, the word "comes" is the only possible word with this combination.
A slight modification of this strategy is to start with the word AROSE but then move on to your own strategy that you customize as you go on. For example, if only E goes yellow, target the next four most common letters (i.e. T, L, I and N) by guessing a word that contains them but also does not have E that last location (in this case, go with INLET as word number two). You've now accounted for 64% of all possibilities BUT you have a greater chance that the E will end up green. So 6% less than using UNTIL, but the possibility of a green. Or, worst case, you'll know another spot that E is not located. To me, a worthwhile trade-off but you are thinking on the fly.
2. "TARGETTING GREEN SQUARES" Strategy
When you target green squares, you get the added bonus of knowing not only the proper letter, but its location. To do this, I've broken down all five-letter words by both the most common letter and where in the word that letter is found.
From this, we know that, of all 64,575 letter possibilities, 6.1% consist of an S at the end of the word. Likewise, S is the most common possibility at the start.
A and O are most likely found at spot number two, while E, by far, is most commonly found in the 4th location. R is most likely to be found in location three.
While "arose" was the best word in strategy number one, "arose" has the letter A in a location where it is not likely to be found (i.e. the start). S is also least likely to be found in the middle of the word, and that's where "arose" has it.
In this 'targetting green strategy', the best possible guess would be "sares" but this isn't a real world. In that case, SORES is most likely to get you a green. In fact, on average, you'll get a green 17.2% of the time.
Of course, the trade-off is that you will have, on average, 33.8% of the letters correct, versus 43.1% with "arose". So once again, it depends what type of player you are.
PLEASE NOTE: The chart above is also very helpful when you have a yellow letter and you want to know where it could end up. For example, you are wise to put a yellow "J" at the start where it is 2.5 times more likely to be found than all the other four spots combined.
3. "TARGETTING YELLOW AND GREEN SQUARES" Strategy
This is my preferred strategy. While you are less likely to get a yellow square, the increase in getting a green square makes this worth it. Getting a green reduces the number of possible final words.
For this strategy, go with PARES, COULD, and then MINTY.
This approach targeting not just all 15 most common letters, but it hits 13 of the 15 in its most commonly found location.
In this approach, on average, you cover 32.6% of the green possibilities and 83.8% of the letters (i.e. yellow).
Ultimately, the strategy you choose is up to you. Consider experimenting with them all. In any given game, one strategy will be best.
But statistically, that best strategy, on average, is number three.