Claude Opus 4.7 Model Card

<- Back

Claude Opus 4.7 Model Card

adocomplete

Comments (70)

bachittle
So Opus 4.7 is measurably worse at long-context retrieval compared to Opus 4.6. Opus 4.6 scores 91.9% and Opus 4.7 scores 59.2%. At least they're transparent about the model degradation. They traded long-context retrieval for better software engineering and math scores.
vessenes
This is an interesting document, in that it reads like a Claude Mythos model card that was hastily edited to be an Opus 4.7 model card.I surmise that someone at the top put the Mythos release on hold, and the product team was told "ship this other interim step model instead. quickly."I wonder if 4.7 will be seen as a net step-up in quality; there are some regressions noted in the document, and it's clearly substantially worse than Mythos, at least according to its own model card. Should be an interesting few months -- if I were at oAI I'd be rushing to get something out that's clearly better, and pressing for weakness here.
koehr
This reads more like an advertisement for Mythos, on the first glance
kube-system
> Chemical and biological weapons threat model 2 (CB-2): Novel chemical/biological weapons production capabilities. A model has CB-2 capabilities if it has the ability to significantly help threat actors (for example, moderately resourced expert-backed teams) create/obtain and deploy chemical and/or biological weapons with potential for catastrophic damages far beyond those of past catastrophes such as COVID-19.That's an interesting choice of benchmark for measuring the risk of "Chemical and biological weapons"
Symmetry
> The technical error that caused accidental chain-of-thought supervision in some prior models (including Mythos Preview) was also present during the training of Claude Opus 4.7, affecting 7.8% of episodes.>_>
STRiDEX
Dumb question but why are chemical weapons always addressed as a risk with llms? Is the idea that they contain how to make chemical weapons or that they would guide someone on how?Would there not already be websites that contain that information? How is an llm different, i guess, from some sort of anarchist cookbook thing.
msla
PDF, because it isn't marked.
anon
undefined
100ms
$ pbpaste | wc -w 62508 $ pbpaste | grep -oi mythos|wc -w 331 $ pbpaste | grep -oi opus|wc -w 809
aliljet
Have they effectively communicated what a 20x or 10x Claude subscription actually means? And with Claude 4.7 increasing usage by 1.35x does that mean a 20x plan is now really a 13x plan (no token increase on the subscription) or a 27x plan (more tokens given to compensate for more computer cost) relative to Claude Opus 4.6?
Rekindle8090
Can someone please explain the point of these incremental upgrades? Just release one model. Then maybe do a .5. Then do the next version.What is the justification for .4.5.6.7.8.9 when the difference isn't measurable and it destroys productivity because they test the next increment on the previous one without customer consent?
bicepjai
This card is a 272 page report. So now we are redefining names :)
joeumn
I'm actually surprised at how it performed compared to 4.6 and also compared to mythos. Will be fun to use.
il-b
Ironically, the website is down
jmward01
Haiku not getting an update is becoming telling. I suspect we are reaching a point where the low end models are cannibalizing high end and that isn't going to stop. How will these companies make money in a few years when even the smallest models are amazing?
nothinkjustai
How much do you want to bet this is Mythos, and Anthropic released it as Opus to avoid embarrassment after all the hype they whipped up…
NickNaraghi
232 pages is bullshit. Longer than the Mythos system card? What are you hiding.
nullc
The model card doesn't mention if this revision will continue to make up and fan vicious conspiracy theories like the prior one does.I've getting a small but steady stream of harassment from mentally ill people who get spun up on crazy conspiracy theories and claude is all too willing to tell them they are ABSOLUTELY RIGHT, encourage them to TAKE ACTION, and telling them that people who disagree are IN ON IT.The other major AI LLM services will shut down the deflect to be less crazy or shut down conversation entirely, -- but it seems claude doesn't. Anthropic is probably the worst about prattling on about safety but it seems like their concern is mostly centered on insane movie plot threats and less concerned about things with more potential for real harm.I've complained to anthropic with no response.
pukaworks
[dead]
gignico
So LLMs are destroying the economy and the environment but at least “catastrophic risk” is still low. Ok then…
deflator
Model Welfare? Are they serious about this? Or is it just more hype? I really don't trust anything this company says anymore. "We have a model that is too dangerous to release" is like me saying that I have a billion dollars in gold that nobody is allowed to see but I expect to be able to borrow against it.