Need help?
<- Back

Comments (70)

  • bachittle
    So Opus 4.7 is measurably worse at long-context retrieval compared to Opus 4.6. Opus 4.6 scores 91.9% and Opus 4.7 scores 59.2%. At least they're transparent about the model degradation. They traded long-context retrieval for better software engineering and math scores.
  • vessenes
    This is an interesting document, in that it reads like a Claude Mythos model card that was hastily edited to be an Opus 4.7 model card.I surmise that someone at the top put the Mythos release on hold, and the product team was told "ship this other interim step model instead. quickly."I wonder if 4.7 will be seen as a net step-up in quality; there are some regressions noted in the document, and it's clearly substantially worse than Mythos, at least according to its own model card. Should be an interesting few months -- if I were at oAI I'd be rushing to get something out that's clearly better, and pressing for weakness here.
  • koehr
    This reads more like an advertisement for Mythos, on the first glance
  • kube-system
    > Chemical and biological weapons threat model 2 (CB-2): Novel chemical/biological weapons production capabilities. A model has CB-2 capabilities if it has the ability to significantly help threat actors (for example, moderately resourced expert-backed teams) create/obtain and deploy chemical and/or biological weapons with potential for catastrophic damages far beyond those of past catastrophes such as COVID-19.That's an interesting choice of benchmark for measuring the risk of "Chemical and biological weapons"
  • Symmetry
    > The technical error that caused accidental chain-of-thought supervision in some prior models (including Mythos Preview) was also present during the training of Claude Opus 4.7, affecting 7.8% of episodes.>_>
  • STRiDEX
    Dumb question but why are chemical weapons always addressed as a risk with llms? Is the idea that they contain how to make chemical weapons or that they would guide someone on how?Would there not already be websites that contain that information? How is an llm different, i guess, from some sort of anarchist cookbook thing.
  • msla
    PDF, because it isn't marked.
  • anon
    undefined
  • 100ms
    $ pbpaste | wc -w 62508 $ pbpaste | grep -oi mythos|wc -w 331 $ pbpaste | grep -oi opus|wc -w 809
  • aliljet
    Have they effectively communicated what a 20x or 10x Claude subscription actually means? And with Claude 4.7 increasing usage by 1.35x does that mean a 20x plan is now really a 13x plan (no token increase on the subscription) or a 27x plan (more tokens given to compensate for more computer cost) relative to Claude Opus 4.6?
  • Rekindle8090
    Can someone please explain the point of these incremental upgrades? Just release one model. Then maybe do a .5. Then do the next version.What is the justification for .4.5.6.7.8.9 when the difference isn't measurable and it destroys productivity because they test the next increment on the previous one without customer consent?
  • bicepjai
    This card is a 272 page report. So now we are redefining names :)
  • joeumn
    I'm actually surprised at how it performed compared to 4.6 and also compared to mythos. Will be fun to use.
  • il-b
    Ironically, the website is down
  • jmward01
    Haiku not getting an update is becoming telling. I suspect we are reaching a point where the low end models are cannibalizing high end and that isn't going to stop. How will these companies make money in a few years when even the smallest models are amazing?
  • nothinkjustai
    How much do you want to bet this is Mythos, and Anthropic released it as Opus to avoid embarrassment after all the hype they whipped up…
  • NickNaraghi
    232 pages is bullshit. Longer than the Mythos system card? What are you hiding.
  • nullc
    The model card doesn't mention if this revision will continue to make up and fan vicious conspiracy theories like the prior one does.I've getting a small but steady stream of harassment from mentally ill people who get spun up on crazy conspiracy theories and claude is all too willing to tell them they are ABSOLUTELY RIGHT, encourage them to TAKE ACTION, and telling them that people who disagree are IN ON IT.The other major AI LLM services will shut down the deflect to be less crazy or shut down conversation entirely, -- but it seems claude doesn't. Anthropic is probably the worst about prattling on about safety but it seems like their concern is mostly centered on insane movie plot threats and less concerned about things with more potential for real harm.I've complained to anthropic with no response.
  • pukaworks
    [dead]
  • gignico
    So LLMs are destroying the economy and the environment but at least “catastrophic risk” is still low. Ok then…
  • deflator
    Model Welfare? Are they serious about this? Or is it just more hype? I really don't trust anything this company says anymore. "We have a model that is too dangerous to release" is like me saying that I have a billion dollars in gold that nobody is allowed to see but I expect to be able to borrow against it.