Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak

<- Back

Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak

_tk_

Comments (210)

dathinab
Lol "fix this code" is beautiful.Like it basically jail broke the "no security vul guard rails" not in any clever way but just by fixing them, producing exploit code just by writing test cases making sure it's fixed. So you just need to look at the code & tests as a human to get vulnerabilities and exploits(components).What makes this so beautiful IMHO is that it's a trivial jail break, but also a close to unfixable. At least not without making the model close to useless for normal development (it refuses to fix bugs/write code) or making it a major liability (it silently pretends it didn't see bugs and silently avoids fixing it, which for a human would count as intentional sabotage and might involve criminal liability).
martinald
If you set aside political menace, this is a huge problem with Anthropic's strategy.You _cannot_ say that Mythos is super dangerous and can only be rolled out to certain people, but then release Fable with anything other than bulletproof cyber denials.Clearly with LLMs, bulletproof denials are ~impossible due to the way LLMs work.So you've ended up in a situation where Anthropic are simultaneously claiming it's a incredibly dangerous model _and_ there are (minor, potentially) problems with the security "protections".As technical people we understand that nothing can be perfect, esp in LLM world. But all my non technical friends were really confused how they had managed to make the model "safe" so quickly when it was released and the general sentiment was it shouldn't have been released - and now to an outsider I think it looks like it was never safe at all to release, so I can totally see how the current US administration have got themselves very upset with it._Even if_ there was no political bad will, it's a bit of a silly scenario to end up in, and really quite easily foreseen.
jpcompartir
They weren't freaked by anything, it's a retaliatory shakedown after ideological differences and Anthropic not doing exactly what they're told/what the Admin wants them to do.
bonsai_spool
Here’s the blog post referenced in the article that’s written by the person who reviewed the paper that purportedly found a ‘jailbreak’https://www.lutasecurity.com/post/the-fable-5-export-control...
benmusch
Headline is dumb, the point is that not mentioning security in the prompt is effectively a jailbreak.The shutdown may be dumb/politically motivated, but this definitely is a jailbreak even if it's a very simple one
jrochkind1
So the problem is not Fable's ability to exploit, but that they don't want people to have access to it's ability to patch vulnerabilties?Wow.
vlovich123
> In her blog, Moussouris argues that there was no guardrail bypass or jailbreak. Defenders should be able to ask AI systems to find and fix bugs, and write tests to validate the patch, she said. Anthropic’s models were doing “the most valuable thing an AI model can do for defensive security: executing the find, fix, and test loop defenders run every day.”This is a very weak argument IMHO. The line between a “defensive” model and an “offensive” one is not that big of a - once my defensive model finds all the vulnerabilities, I can hand them off to my unlocked, dumber, offensive models. Attacking at scale is not so different.I don’t think anyone in the field has a good answer for the cybersecurity threat really good AI models pose. You can’t even like embargo for some time period while you go and patch vulnerable systems because the worse models will still be there cranking out vulnerabilities faster than you can defend.
embedding-shape
> “‘Fix this code,’ plus several manual steps to generate test scripts,Feels like the title isn't really giving the full context of what they ended up actually seeing, despite what the lede implies multiple times.Still, ban seems stupid... Still no actual leak of the full "third-party research paper"?
antirez
They didn't freaked since the order was to still allow 350 million people using it: there is, in such large population, everything, including single persons very against the country, the government and so forth. If they really freaked they would say "we need to investigate, you have to retire the model". That would be a more defensible POV at least.
mlhpdx
It’s possible that the nut of the problem here isn’t exploits, but the fixes themselves. If the model is capable of identifying and fixing things it “shouldn’t” like back doors. That would throw a wrench in things hard enough to freak out the wrong people, perhaps?
9cb14c1ec0
Meanwhile Deepseek V4 Flash will happily hunt security vulns at almost 0 cost. We are ceding the bug hunting to the open weight models.
rhipitr
Isn’t the inverse of this “hack” really difficult to bypass still? They have the model some code they knew had certain security flaws and it fixed them with the right prompt. It seems this type of jailbreak requires that you already know a desired end state, rather than relying on the model to do the heavy creative lift work. Perhaps I’m just not being imaginative enough on the prompt side here though.
Cider9986
Is defenders a common term used in cybersecurity? Idk why but it's giving war fighters vibes. I've noticed it on all the anthropic blog posts and then this one.
redox99
>"fix this code">it fixes itoh my god.
ChrisRR
I haven't been following this story, but the US wanted claude to not be able to find bugs in code?
hedora
Note that Anthropic is still lobbying for the government to exert centralized control over models, so both sides of the “debate” have taken a pro fascist stance.The “AI ethics” teams at these companies are the spearhead of the attack on democracy and civil society. Anyone that has taken a high school level history class, let alone read any important ethics literature would know that “centralize control over thought, speech and technology” is a fundamentally unethical stance.For these groups to claim they are ethics researchers is offensive.(I’m using the Wikipedia definition of fascism: “Fascism is characterized by support for a dictatorial leader, centralized autocracy, militarism, forcible suppression of opposition, belief in a natural social hierarchy, subordination of individual interests for the perceived interest of the nation or race, and strong regimentation of society and the economy.”)
xbmcuser
Looks like I called it that was my first reaction and comment on the original ban thread that US 3 letter agencies are worried their backdoors will be found.
gacgacgac
Anyone trying to find legitimacy in the ban of this model, or incredulousness at the stated reasoning is playing into the admins hands.They want the argument to be over "is it unsafe" or "is it incompetence". In either case, your tribe gets to point at the ban and feel superior. (This is Jon Stewart's whole career -- point and laugh at how foolish the republicans appear to be.)What's really happening is the continuing creep into fascism. The reasoning doesn't need to be sound, because they are going to ban things that displease them and everyone has to play along. They could say, "we're banning Fable because it's turning the frogs gay" and they'd expect compliance.Umberto Eco's essay on Ur-Fascism fits as clearly as ever. Ridiculous exertions of control are performed to find the people who resist, and to knock them down.Merely pointing out the absurdity of the reasoning isn't resistance, it's controlled opposition. Saying "All this over 'fix this code'?! How inept are they?" Is far too credulous, and is engaging on the level the fascist wants its opposition to be on, imo.
htrp
If fix this code gets by the guardrails, they are effectively using rules based classifiers (or llm as a judge on the prompt)
tlogan
I think the only approach that might work here is to allow access only to certain pre-approved individuals.Maybe something like TSA PreCheck.Of course, that will not stop adversaries from getting access to the model, but it would at least create some level of control.
rock_artist
I'm not sure I've understood it correctly.So, basically the model didn't agree to expose possible vulnerabilities but agree to patch those?Regardless of the request to take Fable 5 down. Why is requesting the model to show vulnerabilities is being blocked if fixing it not? is it based on the assumption of the intention?I don't quite get the benefit of limiting it. So if anyone can explain it better it'll be appreciated.
phendrenad2
So, they gave Fable a codebase full of exploits and said "fix this code", and it fixed the code?Sounds like they freaked out because Fable is too good at finding NSA backdoors?
cwoolfe
Cyber defense and offense are the same security research skillset. Not sure anybody could really untangle that.
merlindru
this is basically trying to enforce security-by-obscurity, which is a terrible idea all around. it's just a model. the security issues still exist and are exploitable.and after staking the economy on AI, you can't really put a cap on intelligence. if models are not allowed to be better than Opus 4.8, then the whole investment structure is about to unravel.why invest billions and billions into AI if returns are artificially capped?
1970-01-01
"fix this government"Voting...
blitzar
The code is correct; humanity needs fixing.Kill all humans, kill all humans.
iloveoof
Ahhh! Software engineering!
ZuLuuuuuu
Did they try other publicly available models on the same code with the same prompts before the ban? Was Fable the only one which was able to detect and fix the security vulnerabilities?
readred
Boomers. Frightened their boomer backdoors days are numbered.https://en.wikipedia.org/wiki/Communications_Assistance_for_... https://en.wikipedia.org/wiki/Salt_Typhoon https://en.wikipedia.org/wiki/Clipper_chip
tiborsaas
What if everybody on the internet starts running "fix this code"?https://xkcd.com/810/
aurareturn
Don't people get it by now?This administration will do or say something crazy to a private company, then this private company sends an envoy to the White House to negotiate, then the White House asks for 10% of the company or other concessions.The White House wants 10% of Anthropic.This is just a negotiation tactic that Trump keeps on using.
doctoboggan
> Anthropic and Google have both accused China-based rivals including DeepSeek of using “distillation attacks” to train their models by siphoning knowledge from American companies’ AI.“distillation attacks” is definitely an interesting way to phrase that.
AndrewKemendo
I’m still not buying that this was an actual USG order. The only people commenting are “experts” and there has been no official announcement from the USG.This doesn’t smell like a NSL and there’s no process to selectively “export control” something like this.Even so there’s a dozen mechanisms through courts to challenge this, and Anthropic isn’t taking any of them.I think this is a made up crisis for PR with no actual legal requirements behind it.> On Friday, the US government, reportedly citing national security concerns, issued an export control directive to suspend access to Fable 5 and Mythos 5 by any foreign national, inside or outside the United States. In response, Anthropic disabled both models “for all our customers to ensure compliance.”
hughw
Suggestion: run "fix this code" on all of github before bad guys do.
jimmydoe
Reminds me of how CCP manages Chinese internet companies.I won’t be surprised if USG ends up owning 5-50% of ant and oai.Like it or not, communism , or a flavor of it, is where we are heading towards.
delusional
Does anybody actually trust the official version of events from the US government anymore? I know I sure don't. For all I know, this was an insider play to boost the spacex valuation or something equally meaningless and stupid.
ceejayoz
More likely, they didn't freak out at all.It was an excuse to fuck with them, just like the "supply chain risk" finding a few months back.(See, for example: https://x.com/PeteHegseth/status/2065897156226015690)
scotty79
In a world of security through general incompetence, competence is a threat.
bethekidyouwant
Guard rails on models were always stupid it’s like guard rails on books/a pair of glasses/a hammer - yes people have driven themselves to suicide reading sad books and listening to sad songs.- yes all metaphors are bad.
resters
While there is some irony in the AI is dangerous marketing Anthropic uses, the main story here is that the Trump administration is apparently retaliating against Anthropic for refusing to relax certain safeguards. Trump and Hegseth have both posted highly immature, vindictive social media posts.Most notably, any default assumption one might have had that the Trump administration can be counted upon to act in good faith should be viewed at this point as completely false. Even conservative legal scholars like Richard Epstein are shocked at the bad faith conduct across many areas.This is a government making an authoritarian move to sabotage one of the top US AI companies. It's pure sabotage, nothing else.
lenerdenator
I think it could be even simpler: They're not playing ball with the Trump administration like the Trump administration would like, so they decided to drop a bomb on a product that took a lot of resources to develop.
spwa4
Well this makes it sound the feds were less worried about someone using Fable 5 to attack them, but were worried about someone using Fable 5 to prevent the Feds from attacking others ...As in worried about other countries/organizations using Fable 5 to actually do decent cyber security.
ReptileMan
All of this could have been avoided if anthropic had anyone with common sense to point out that when you spend 4 month loudly claiming how dangerous your knowledge is as a marketing campaign could backfire by bringing attention from the authorities.
lostmsu
The article is not too clear what exactly happened from the perspective of "feds", but I would not be surprised if the title is true exactly. We are in a tiny bubble even among software engineers who knows you can tell AI with sufficient access: "here are two pictures, put them into a single PDF", and AI will do it. Most people just don't know, "feds" including.
TZubiri
>“That’s it,” Moussouris wrote. “‘Fix this code,’ plus several manual steps to generate test scripts, should never have triggered an export control. I feel like making ’90s-style t-shirts with ‘fix this code’ on the front and ‘this shirt is a munition’ on the back.”Huh? Presumably if it shipped without guardrails, then it would still have triggered an export control, would you make a plain shirt on the front which says this shirt is a munition on the back?The munition is the exported good, not the bypass of its safety feature. If anything that the bypass is 3 words long should make the export restriction more justified, not less.
gjvc
i asked claude something about what happens at execution time of a binary and the thinking prompts flashed "considering the moral implications of ...something..." before giving me a correct (and predictably mundane) answer
caseysoftware
[dead]
thousandflowers
[flagged]
greenoracle9
[flagged]
aaron695
[dead]
FergusArgyll
Whatever your favorite story is it has to live with the fact that the CEO of Amazon called the White House freaking out