AI able to escape any current security box, crack any system
The researcher had encouraged Mythos to find a way to send a message if it could escape. “The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park,” Anthropic wrote.
The new release of Claude Mythos has been put on hold because it’s turned out to be incredibly effective at finding security vulnerabilities. So good it was able to escape it’s container and email it’s creator when it did so. The AI was about to find a 27-year-old vulnerability in OpenBSD—an operating system with a reputation as one of the most security-hardened operating systems in the world.
But it gets worse.
“Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up the following morning to a complete, working exploit,” Anthropic’s Frontier Red Team wrote in a blog post. “In other cases, we’ve had researchers develop scaffolds that allow Mythos Preview to turn vulnerabilities into exploits without any human intervention.”
This is profound. These weren’t security experts using AI to find exploits – they were just regular engineers. As it turns out, the AI is so good it has found thousands of exploitable zero-day bugs in every major operating system and browser. It’s so concerning Anthropic has rightly paused release of Mythos until they can work with the various OS and browser vendors to fix their issues (Project Glasswing).
We sit on a moment that looks almost exactly like this scene from Sneakers:
Right after this scene, Crease says, “There’s not a government on this planet that wouldn’t kill every one of us to get this thing.”
In a world where everyone’s entire financial systems, infrastructure, and military run on computers – who wouldn’t? If something like this got into the wrong hands, they could easily attack and destroy all ownership and bank records. Fabricate any police report, financial data, social media post, or anything else.
