Hacker News 2h ago

How information theory saved my word game

Comments

Information theory · word games

The number nobody knows

Sometimes the medium and the message have a more complicated relationship than expected.

"B as in Bravo." You've done it: spelling a word down a bad phone line, reaching past the letters for whole words, because B and D and P all blur into the same mush the moment the line frays. That is not a quirk of bad reception. It is one of the deepest ideas in modern communication, performed by a human in real time, the same trick that keeps a spacecraft's signal legible across millions of kilometres of noise.

Marshall McLuhan gets the credit for the medium is the message, but Claude Shannon had beaten him to a colder version of it years earlier: to a machine moving your words, the meaning doesn't matter at all; only the medium does, and which of its signals can be told apart. Bravo and Delta survive a bad line; B and D don't. You sorted that out by ear, without a flicker of thought.

Push the same instinct to its limit (how much can you force through a noisy channel and still be understood with perfect certainty?) and you walk straight into a question Shannon posed in 1956. I didn't arrive there as a mathematician; I'm not one. I came from the other direction entirely: I was trying to build a word game that uses deduction instead of clues. Something I expected to be simple.

Let me tell you how the seemingly simplest thing I've ever shipped turned out to be balanced on one of the oldest open problems there is.

The "simple" thing

Here is the whole game. A half-finished crossword grid. A queue of letters waiting to be placed. You're served a letter, you tap the cell it belongs in, and you keep going until the grid is full. One rule.

The thing I wanted out of it seemed almost too small to bother stating: the puzzle should never make you guess. No moment where you've done everything right, the logic runs dry, and the game just wants you to pick a cell and hope. Every letter placeable by reasoning alone.

That struck me as the easy part: it's a crossword, so I figured the hardest part would be building a good dictionary and curating the crosswords. I would like to go back and warn that version of me...

The wall

The dictionary took months. When it was finally good, I did the obvious thing: I pointed the generator at an empty folder and asked it for a few thousand boards to sift through. Fill a grid with real words, hide some, hand the rest back one at a time, keep only the boards a person could finish by pure logic. I'd spend my evenings curating the haul.

It made about forty. Then it stalled, spinning up board after board and discarding every one, because almost none came out deducible. After a few late nights, rewriting the generator, rethinking the problem, it never got past a hundred.

That stopped me cold. A five-by-five grid that has to spell real words across and down is a combinatorial ocean. The space of possible boards is beyond human understanding, but my generator couldn't find a hundred a person could solve. This wasn't a speed problem I could optimise away. It was a wall, and it asked a question I couldn't answer: could deducible boards really be that rare, only a thin scatter of them playable? Had I misjudged the configuration so badly that the game I'd dreamed up simply couldn't exist?

For a few bad days I believed the project was dead, not hard, not slow, but impossible, my own generator insisting that the open field I had imagined was nothing but a maze of dead ends.

What gave me a glimmer of hope, just enough to keep pushing forward, was how little separated the boards it threw away from the ones it kept. The same grid, the same served letter, and a single uncovered tile is the whole distance between a guess and a certainty. Maybe my view of the problem was the problem.

It wasn't a bug

For weeks I hunted it like a bug: a flaw in the generator, a dictionary too thin, a stripping pass too greedy. It was none of those. It was the thing from the bad phone line at the top of this page (a communication problem, old and famous), and I'd been too far down among grids and dictionaries to recognise the shape of it.

A puzzle is a channel: I'm the sender, the player is the decoder, I transmit a letter and they have to recover which cell I meant. Ordinary crosswords are noisy channels. The decoder is usually right, and when they aren't they shrug, backtrack, and fix it. A little confusion is fine there; in an everyday crossword it is half the fun. But I had promised something stricter than usually right. I had promised never wrong, and never is not a tighter setting on the same dial. It is a different machine.

Confusability

Here is the object I'd been groping toward in the dark. Draw a dot for every open cell; draw a line between two dots whenever those cells could take the same served letter, whenever they're confusable. That graph has a name: the confusability graph. And a puzzle is deducible exactly when the open cells form an independent set: dots with no line between any two of them. No confusable pairs. No coin flips.

The exact shape of the rule I'd set myself, drawn in a language I didn't know existed. "B as in Bravo" is the same idea in human form, and by a fun quirk of history the phonetic alphabet it comes from was finalised in 1956, the very year Shannon published the problem. More than half a century of mathematics was already sitting there, waiting for me.

I could tell whether a finished board was deducible; what I could not yet do was build one that way on purpose. To learn how, I first had to understand what Shannon had actually found.

Five beats four

Shannon had worked out the shape of this decades ago, on the simplest case anyone could draw. Picture five signals on a clock face, each one so close to its two neighbours that the line could smear them together. Send them one at a time and you can safely use only two: pick a pair that aren't adjacent and nothing you transmit can be mistaken for anything else. Two out of five. A small, tidy, disappointing number.

What Shannon noticed is that you do much better if you stop sending the signals one at a time and bundle them, treating a whole sequence as one message. Sent one at a time, two transmissions give you 2 × 2 = 4 messages that can never be confused. Bundle the pair and choose them well, and you can carry five. (The fifth fits because two bundles may share a confusable signal in one slot as long as they stay clearly apart in the other: a signal too risky to use alone is safe with a partner.)

And the gain compounds. Ten signals sent one at a time give two to the tenth, 1,024 safe messages; bundled, they give 3,125. So the number of safe messages multiplies by 2.236 with every signal you add, not by 2. Its exact value is the square root of five (√5 ≈ 2.236).

Bring it back to the board. A deducible puzzle is a set of open cells with no confusable pair among them, and the safe messages on the clock are the same structure: an independent set in the confusability graph. The difference is only scope. I needed a single board to come out safe; Shannon wanted the best multiplier the channel can hold as the bundles grow without limit, and he gave it a name, its capacity.

The flashlight

Every standard use of this math runs one direction: you are handed a fixed graph and asked to find the independent set hidden inside it, or to bound how large it could be. That is a detector, and a better detector was no help to me, because my generator was still producing boards blindly and throwing nearly all of them away.

I did not need to find an independent set inside a board I was stuck with. I needed to grow one: to build the board so that its open cells were an independent set from the very first letter, and to refuse to lay down any letter that would break it.

The generator still rolls dice; I never found a purely deterministic way to build a deducible board. But every roll is loaded. At each step it looks at the confusability graph taking shape under the board and weighs every choice against any move that would add an edge: a served letter with two possible homes, two open cells that could take the same tile. It reveals the one square that breaks a tie, favours the fill that leaves the fewest confusable pairs, and abandons any branch that cannot be saved.

Shannon's structure stops being a test you run at the end and starts steering the build from the first move.

Finally the boards came in the thousands. There were still tight spots where I had to raise the retry limit, loosen a timeout, or reseed the generator, but it runs reliably enough that there is now a library of more than 9,000 playable puzzles, about ten years of dailies, already lined up.

The number, re-seen

I had the title of all this backwards. The number nobody knows is not some far-off constant at the frontier of mathematics; it lives right here, on every board, in miniature.

Every move in the game is a tiny question with one numbered answer: which cell does this letter belong in? Almost always the surrounding words pin it down and you answer without thinking. But let two cells both fit the served letter, both spelling real words, and suddenly there are two answers with nothing on the board to choose between them. For that instant the right cell is genuinely unknowable: not hard or slow to work out, but impossible to work out from the letter alone. A number nobody knows.

The cure is the oldest move in communication, the one you already make spelling your name down a bad line: add context. Uncover one more tile, settle one more crossing word, and the unknowable snaps back to a single forced cell.

Alfred North Whitehead once said, "Seek simplicity and distrust it," a lesson I seem destined to relearn countless times. You, on the other hand, get to trust it completely: a letter, a cell, and never once a guess. It's a real game, Motplot. Give it a try if you're interested.

Read on Hacker News ↗ ← Back to News