Friday, December 8, 2023

Creating and Solving Spelling Bee Puzzles in Python

 I enjoy playing the New York Times' game Spelling Bee. I wondered about how to write a Python script that creates and solves Spelling Bee puzzles and decided to take a stab at it.

For those who aren't familiar, each puzzle is built on a seed word containing exactly seven unique letters. These are randomly arranged in a hexagon format resembling a honeycomb. You make words at least four letters long from these letters (you can use letters more than once) and the puzzle judges them as real words or not. One letter is placed in the center of the puzzle and must appear in all words. The words are scored based on their lengths and the game designates you as a genius (or lesser skill levels) based on your score. Every puzzle has one or more pangrams that use all the letters in the puzzle at least once; one is the seed word. although there can be more.

It turns out you don't need much code at all to generate a puzzle that meets the criteria. Given a list of English words, you can easily identify all the ones that could be a seed word (contain only letters and have exactly seven unique ones). Then you can either pick one at random or ask the user to enter one (making sure it qualifies) as the seed for a puzzle.

Most Unix-like systems have a word list at /usr/share/dict/words. This list exists on Mac OS X, so I've used it. The list does have a flaw: it's much bigger than the one the New York Times uses. (I've often been surprised by the words that aren't recognized by the official Spelling Bee puzzle.) This means that not only will the solution set be somewhat different from the Times' answers, you'll see a number of obscure and naughty words that you'd never see in theirs. No big deal, it'll work with any word list so you can substitute whatever you want. Exercise for the reader and all that.

The basic approach I took is:

  1. Read all the words into a list. Make a second list containing only the words that could be seed words (exactly seven unique letters).
  2. Ask the user to select a seed word or allow the script to choose one randomly. If the user enters one, it is validated against the seed word list. If the user allows the script to choose one, they're further asked for a sequence of letters that should appear in the seed (I have noticed that puzzles often contain "ing" or another sequence of letters that constrains the solutions, so I added the ability to do that, if you want).
    • The user may also just enter a string of 7 unique letters that are not a word. (You might have this instead of the actual seed word if you are trying to solve an existing puzzle.) The script will find the first pangram made of those letters and use that as the seed word.
  3. Ask the user to choose which letter should be in the center of the puzzle, or again choose it randomly.
  4. Find all the words in the word list that are solutions to the puzzle. These are the ones that are at least four letters long, contain only letters from the seed word, and contain the center letter.

The script can be used either to generate puzzles (by finding good seed words, trying to solve, then checking your answers) or to help solve a puzzle, including the ones in the Times.

You could expand it into an actual game that you can play, with a GUI and all that, but I'll leave that for you. Here's the code.

# bee.py - using UNIX word list, create a puzzle like the New York Times' 
#          "Spelling Bee," along with all possible solutions

# The UNIX word list is more extensive than the Times' list; you'll see
# some mighty obscure words, and also some totally inappropriate ones.

import random, itertools

all_words = open("/usr/share/dict/words").read().splitlines()

# read potential seed words: at least 7 unique letters and no uppercase or punctuation
seed_words = [word for word in all_words if len(word) > 6 and word.isalpha() and 
              len(set(word)) == 7 and word.lower() == word]
print(len(seed_words), "seed words")

# get or randomly choose seed word for puzzle
seed_word = ""
while seed_word not in seed_words:
    seed_word = input("seed word, exactly 7 unique letters (enter for random): " ).lower()
    if seed_word == "":
        required_letters = input(
            "... substring the random word must contain: ").lower()
        seed_word = random.choice(seed_words)
        while required_letters != "" and required_letters not in seed_word:
            seed_word = random.choice(seed_words)
        print("seed word:", seed_word.upper())
    else:
        # the user entered a word but it is not a seed word
        if seed_word not in seed_words:
            seed_word = set(seed_word.lower())
            if len(seed_word) == 7:
                seed_word = min(itertools.chain((
                    word for word in seed_words if set(word) == seed_word), ""), key=len)
                if seed_word:
                    print("seed word:", seed_word.upper())
            else:
                seed_word = ""

# get or choose center letter for puzzle, which must appear in all answers
seed_letters = set(seed_word)
center_letter = ""
while len(center_letter) != 1 or center_letter not in seed_letters:
    center_letter = input("center letter (enter for random): ").lower()
    if center_letter == "":
        center_letter = random.choice(seed_word)
        print("center letter:", center_letter.upper())

# print puzzle, randomized but with center letter in middle
puzzle_letters = list(set(seed_word.upper()))
random.shuffle(puzzle_letters)
# put the center letter in the center
center_index = puzzle_letters.index(center_letter.upper())
puzzle_letters[center_index], puzzle_letters[3] = puzzle_letters[3], puzzle_letters[center_index]
print("puzzle:  {0} {1}\n        {2} {3} {4}\n         {5} {6}".format(*puzzle_letters))

# generate solution words. print with pangrams in caps
solution_words = [word for word in all_words if len(word) > 3 and center_letter in word 
                  and not set(word).difference(seed_word)]
print(len(solution_words), "solution words:", sorted(
    word.upper() if len(set(word)) == 7 else word for word in solution_words))