“All understanding whatsoever is, at bottom, metaphorical.”

— Iain McGilchrist, The Matter With Things

Apple and pear

A friend at a local startup told me they’ve been playing 20 questions with reasoning models—but with the model as the answerer. The model is invited to think of something, then the humans ask yes/no questions to guess it, reading the thinking traces as they go.

In one round, the model picked “apple.” The humans could see this in the trace. But partway through, the model got confused and started answering as though it had picked “pear.” The humans knew it was wrong about its own secret. The model didn’t—it couldn’t see the trace showing what it had originally chosen.


We’re well into the era of inference-time compute. The “reasoning traces” you see in ChatGPT, Claude, and the terminal-based tools are summaries of thinking tokens—output the model generates to work through a problem before giving you its final answer.

But here’s the thing: those thinking tokens work exactly like regular output tokens. The model produces them one at a time, each token informed by everything that came before. When it’s done thinking and starts writing the actual response, it can see what it “thought”—but only because those thoughts are now part of the text.

There’s no separate scratchpad. No persistent internal state. The model’s “train of thought” exists only to the extent that it’s been written down.

A human playing 20 questions as the answerer remembers what they picked. Even without saying it out loud, they hold the answer in mind and check each response against it. The reasoning persists independently of whether it was spoken.

Models don’t have this. They only remember what they’ve said out loud. The model that picked “apple” had no way to check its later answers against that choice—the thinking trace where it decided on “apple” was invisible to it by the time it started answering “pear.” It had no privileged access to its own prior reasoning. Just the text of its responses.


Uncle Bob Martin put it well this week:

Claude codes faster than I do, by a significant factor. Claude can hold more details in its “mind” than I can—again by a significant factor.

But Claude cannot hold the big picture in its mind. It doesn’t really even understand the concept of a big picture. Architecture is likely beyond its capacity.

He’s right, but I’d frame it differently. It’s not that Claude can’t understand big pictures. It’s that Claude can’t hold anything. Every response starts from scratch, reconstructing context from the transcript. The “mind” that holds more details than Uncle Bob is an illusion created by the context window—a finite text buffer, not a persistent understanding.

This is why architecture is costly for models, not hard. They can inhale a codebase fast enough—but they have to reload it into context every time. There’s no persistent grasp of the structure, just repeated reconstruction. And they still don’t remember why something was structured a certain way unless that reasoning was written down somewhere they can read.

The 20 questions game revealed this starkly: the humans had better access to the model’s “mind” than the model did. They could read the trace; the model couldn’t. Its own prior reasoning was invisible to it.

Andrej Karpathy calls them “people spirits”—stochastic simulations of people, with a kind of emergent psychology. He compares them to the protagonist in Memento: anterograde amnesia, unable to consolidate knowledge, every conversation a fresh start with only their context window as working memory.

“People spirits” is a weird phrase if you stop to think about it. But there’s a micro-community of people who code with LLMs every day—following the Latent Space podcast, hanging off Dwarkesh Patel interviews, absorbing Karpathy’s framing—and the jargon becomes comfortable. These metaphors are useful, not true. The danger is that if you’re steeped in them, you accidentally import assumptions. You hear “thinking” and assume the model knows what it thought. A newcomer asking “wait, can it actually see its own reasoning?” wouldn’t make that leap. There are subtle errors that creep in for insiders that newcomers wouldn’t make. This is downstream of the models being a fundamentally new category of thing. We need metaphors to understand them, but if something is categorically new, all metaphors will fail somewhere. Asking fundamental questions gets you a better grip on the reality, even if it’s less comfortable and doesn’t leave you with easy vocabulary to explain how things work.

When you understand this, some model behavior makes more sense. The way extended conversations drift. The benefit of writing explicit reasoning into prompts rather than assuming the model will “figure it out.”

This is also a subtle form of hallucination that’s easy to miss. When you ask a model “why did you do that?”, its explanation implies it knows its own thoughts. It doesn’t. The explanation is a plausible story generated after the fact, not a report of actual introspection. It’s not overtly false—it’s misleading in a way the model itself can’t detect.

The model that forgot it picked apple couldn’t know it had forgotten. Neither can the model explaining why it did what it did.