Crossing the Horizon

Event horizon

“There is nothing special physically at the horizon. Locally there is nothing special - an observer would not feel some special force, everything is fine for her. However, there is something special: crossing the horizon means that there is no way to return. But this is not visible locally, only globally.”

— Physics Forums

I’ve been using these tools since the summer before ChatGPT launched, when early Copilot made me realise that block comments could be prompts. Since then I’ve lived through distinct phases:

Copilot era - autocomplete on steroids, occasionally magical
ChatGPT / GPT-3.5 - suddenly everyone could see what I’d been playing with
GPT-4 - the first step change I felt in my bones
Claude 3 and Cursor - when the Sonnet models started enabling real coding harnesses
Aider, Claude Code, Codex - terminal-native tools that could hold context and execute

At each point there was a “right workflow” for me that overshadowed the others. Sometimes it was schlepping text between browser and Emacs. Sometimes it was using an external model for planning while local models implemented. Sometimes it was just handing everything to the best harness I had running.

The workflows kept shifting. Change became the new normal. But it also became routinised. Models rolled out on a six-to-twelve week cadence. They upended your workflow, but always to a recognisable degree. You adjusted. You found the new optimum. You waited for the next round.

This Time Is Different

I’m making a claim that I’m not sure how to prove: the current generation of frontier models - from OpenAI, Anthropic, and Google - has not stuck to this pattern.

Something happened in the last few weeks. A step change that went unrecognised for a while and is only now starting to surface as common knowledge. Andrej Karpathy has been talking about it. So has Boris Cherny. The LLM-obsessed corners of X have noticed. But I don’t think it’s widely appreciated yet.

I’m genuinely unsure whether this is just thresholds being crossed - business as usual in terms of gradual improvement, but a sudden unlock as capabilities hit some critical level - or whether the models are just plain smarter now in some discontinuous way.

In many ways I don’t care. The effect is the same.

My life has been altered in a way that pattern-matches to previous key moments: discovering Copilot, first using ChatGPT, feeling the GPT-4 step change, realising that Claude Opus 3 could write integration tests from Laravel classes that actually worked, seeing o3 and knowing it was different even when people around me thought it was business as usual.

This feels like another such event.

It’s Not Just Code

The conversation around this has been dominated by coding. That makes sense - coders are the early adopters, the ones with the harnesses, the ones who notice first.

But I don’t think the step change is restricted to code.

A colleague at work has never written a line of code in his life. He’s now using the terminal as his primary interface with the computer. Speech-to-texting to find the art of the possible, then telling the machine to execute. Some of what he’s producing is code - it would be silly not to push that angle - but a lot of it is just the ordinary parts of his job that predate these tools. The barrier to entry turned out to be willingness to open a terminal. He doesn’t let that stop him, so he’s flying ahead.

A self-employed teacher I know routinely uses LLMs to mark English literature homework. She told me that sometime in the last month, the models went from “marginally useful, needs a lot of correction” to “no notes, no corrections needed.” She’s still reviewing the marking, but she’s never intervening anymore.

That sounds incremental if you think in terms of benchmark scores. But the second-order implications are massive. Suddenly a teacher can mark at scale without quality loss. Suddenly the bottleneck shifts.

I wrote a post earlier today about helping a church complete government paperwork using an LLM. Zero domain expertise. An hour of my time. The output needed almost no corrections. That capability simply didn’t exist a few months ago - not at that reliability level.

Scale these examples across white-collar work and you start to see the shape of something.

The Musician’s View

Here’s how it feels from the practitioner’s seat:

We are no longer playing the same instruments. Whole new genres are suddenly available. The techniques that worked last quarter are already being superseded.

2026 is obviously going to be the strangest year yet. And the biggest GPU superclusters have barely come online.

Locally, everything feels fine. But I think we’ve crossed the horizon.