Claude Sonnet 5

<- Back

Claude Sonnet 5

marinesebastian

Comments (385)

XCSme
I just tested it on my benchmarks[0], it's GLM-5.2 level, at 2x cost, but also 2x faster.Weak spots (categories it fails): - Trivia — 0/3 - basically not much built-in knowledge - Combined tool-calling tasks — score 45/100, sometimes makes invalid tool calls - Puzzle Solving — score 77, flubs carwash-like tests https://aibenchy.com/compare/anthropic-claude-sonnet-4-6-med...
simonw
Claude Sonnet 5 itself described its pelican as looking like a goose:> Illustration of a white goose riding a bicycle, with one wing extended forward to grip the handlebar, set against a plain white background with a brown ground line.https://simonwillison.net/2026/Jun/30/claude-sonnet-5/
doctoboggan
The cost per task chart is telling me that I should _never_ use Sonnet 5 above medium effort level - Opus always performs better for a given cost. So I guess the takeaway is that if Sonnet 5 medium isn't good enough for you, switch models, not effort levels.
microtonal
Claude Sonnet 5 is built to be the most agentic Sonnet model yet. It can make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models.I have been using Sonnet 4.6 more than Opus, because I'm mostly doing agent-assisted development and not fully agent-driven development. This announcement does not make me positive, I have found that the more models are optimized for fully agentic development, the worse they get at assisted development and often start doing too much despite very strict/specific instructions.I have been moving more and more to K2.7 Code and GLM-5.2 the last few weeks. They are often good enough for assistance, very fast, and cheap.
phtrivier
What is the reference, unbiased, honest, reputable and trustworthy site that ranks and compare models on the couple of realistic metrics that matters ? ("Does it work for code", "no, I mean, for real", "how much does it cost", etc...) ?
Jcampuzano2
I'm struggling to understand why I'd ever use this instead of just using a lower effort level for opus given on many of the benchmarks listed the cost per task rises above opus at anything higher than medium effort.Only thing I can think of is for when someone is out of opus credits. Of course there are API billing use cases but I'd probably still just use opus on low.
conradkay
Wow, seems worse even on price/performance than GLM 5.2, which is only 744b parameters.From the system card: "On CyberGym vulnerability discovery, Claude Sonnet 5 is less capable than Sonnet 4.6, and far less capable than Opus 4.8 and Mythos 5As with the other evaluations in this section, these results were achieved with all safeguards turned off. When run with our default mitigations, Sonnet 5 scored a 0 on CyberGym"
__natty__
I wonder if in the current model they somehow used licensed training data or if we are fine with the fact that even „western and rightful" models are still stealing copyrighted work. Is this end to this edichotomy where a large company can do more and can steal someone's work for profit while jailing or fining someone for downloading and reading a pirated book?
Sol-
Wonder if the whole cyber paranoia leads to their models ultimately generating less secure code. After all, if it has the ability to generate safe code, it would imply that it knows something about cybersecurity, which could surely be used to hack all the banks in the world.
phillipcarter
Seems to be another great incremental update to the workhorse, nice!I've been using Sonnet instead of Opus for almost all coding tasks for a while now. A little elbow grease to break down tasks and you can spend a lot less money for just about the same output quality.
m3h
Important to note: "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer that changes how the model processes text to improve performance (this is similar to the tokenizer change we introduced with Claude Opus 4.7). The tradeoff is that the same input can map to more tokens: roughly 1.0–1.35× depending on the content type. The introductory pricing is set so that the transition to Sonnet 5 is roughly cost-neutral."
satvikpendem
> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.Why would they brag about something like this? It's like they know people want to use models to perform cybersecurity tasks yet knowingly deny them the ability.And Opus 4.8 is still cheaper for a higher pass rate (much less open weight models like GLM 5.2) so not sure why I'd use Sonnet except on the low effort level for I suppose trivial tasks where I want it to work only 50% of the time judging by the graph. The pricing doesn't really make any sense.
brunooliv
I only wish Opus 4.6 from earlier this year at a faster inference speed. Since Opus 4.6 things have been so much messier and the overall push for more agency isn’t really panning out for agent assisted development as much as they would like
theLiminator
Seems like the way to go for any smaller models is to only use the low reasoning levels, and for anything where you'd want it to reason harder, to just use a larger model.In effect, high reasoning only makes sense when you're using the frontier model and need extra performance (higher levels of reasoning are never pareto optimal unless you're at the largest model size).
johnfahey
Judging from those cost-performance graphs, Sonnet doesn't make sense to run at anything higher than a medium reasoning level, since Opus 4.8 low reasoning outclasses it for the price.This line as a selling point is also pretty funny:> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.
mag7269
When can we get a new Haiku? 4.5 came out nearly a year ago, and it's showing its age.
wolttam
I didn't think they'd actually release a model that was worse than the open-weight frontier and at a higher price-point. Wow.
SkitterKherpi
$5/$25 for Opus 4.8 vs $3/$15 doesnt seem cheaper enough to be too worth it. It depends how much better it is than e.g. Mimo, but I imagine Mimo and co to be too cost efficient in the lower tier to be overtaken by Sonnet for most tasks.
garo-pro
Seems like the cyber detection even is on Sonnet now. https://support.claude.com/en/articles/14604842-real-time-cy...
DonsDiscountGas
I'd love if they would include speed (though I know there are difficulties involved). At this point the quality of Opus 4.8 is no longer my limiting factor, it's the speed, so a faster model would be great.
mchusma
This is much more interesting of a model at $2/$10 (their launch pricing) than at full price. There are many competing models at around this level of performance.I also like that the difference between low, medium, high, xhigh seems more spread, which is actually a good thing for people trying to tune applications. Running Sonnet 5 on low with the launch pricing makes this potentially a better fit than Haiku or open source models for some tasks. I don't think it will make sense at full price.
alvis
Ironically, the key message of today's release is that Sonnet 5 is far less capable than Opus 4.8 and Mythos 5. It's a funny development is the past few weeks
tokengod
That’s nice, but we want Fable
edude03
Let’s see how long until opus 5 comes out but to me this lends some credence to the rumour that fable/mythos was supposed to be opus 5
827a
Tbh we'll see what using it looks like, but the reasoning/cost charts do not look promising. It seems like the only useful reasoning level for Sonnet 5 is Low; medium might trade blows at price/performance with Opus, but anything beyond that Opus is Just Better.I struggle to understand where this model fits in. If I need a cheap model for simple stuff (like, summarizing an email); I'd go Haiku (actually, I'd go Deepseek v4 Flash, but you catch my drift). I just can't think of many tasks where I'm like "yeah let me reach for Sonnet Low Reasoning so I can save a dollar but also seriously run the risk of it failing"; I'd just reach for Opus Low.
babelfish
System Card: https://www-cdn.anthropic.com/d9bb04416ffe1352af84721476c1fa...
johnhamlin
Kind of hilarious how much they’re touting that it sucks at cybersecurity like it’s a feature
chipgap98
Interesting that tasks on extra high cost almost the same as Opus 4.8 with a slightly worse performance
oybng
In my case, 4.6 degraded massively over time. 5 fails the same basic tasks that I gave 4.6 yesterday. And quite frankly this low, med, high, extra, max, turbo, ultra, ludicrous nonsense is getting tiresome
ThouYS
Why did this get the coveted "5"? I want an Opus that can compete with GPT 5.5
andai
Opus 4.8 beats Sonnet 5 on the pareto frontier in several of their graphs (Agentic Search, Agentic Computer Use).In other words, for certain tasks, Opus 4.8 is cheaper than Sonnet 5, and does better than Sonnet 5.I've noticed this pattern on a lot of benchmarks. You can try to emulate a bigger model by ramping up the test time compute (max reasoning, more turns, model fusion etc.), but you can't reach the same quality level, and you often exceed the cost you would have paid by just using a bigger model.tldr: if you're doing something hard, just use a bigger model.
cenobyte
Claude Sonnet 5 is built to be the most agentic Sonnet model yet.orThe Dodge Charger is built to be the most Charger like car yet.
theplumber
Is there any reason to use Sonnet instead of GLM?
docheinestages
But does it burn tokens just like Opus? That's the feeling I have nowadays. Regardless of what model I choose, the 5-hour limit gets exhausted in the first hour or so.
alvis
What I starting to hate is that each model's effort level can mean completely different power.Today sonnet 5's med level effort is equivalent to sonnet 4.6 low level effort :/
m3h
Why is Claude Sonnet 5 allowed to be released but OpenAI Terra not? Are they not the same class of models?
Cu3PO42
Sonnet 5 is not currently available in the EU region on Bedrock, whereas previous models were and still are. I wonder if this is only due to early stages of the rollout or if this is due to recent US restrictions.Unfortunately that means I won't be using it at work for now.
rw2
The use of the "cheaper models" in big AI companies are next to useless as they don't even score as well as the open/super cheap Chinese models. Only the frontier big models like Fable and Opus have value.
kingjimmy
interesting footnotes: "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer... can map to more tokens: roughly 1.0–1.35× depending on the content type." AKA expect higher costs on Sonnet 5 vs Sonnet 4.6 for the same tasks.
OsrsNeedsf2P
Great timing. I just started using Claude Sonnet as a long term reverse engineering project[0] for a game I used to play as a kid. The cheaper tokens but sufficiently smart with hard verification makes it a perfect combo for the task[0] https://github.com/dginovker/BFME-Source-Code/
whh
It's not Fable, but I'll take it.
arendtio
> Evaluations also show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.It seems being incompetent is a feature now...
swe_dima
Not sure what niche it's going to occupy: too expensive for it's intelligence category.
tripleee
interesting how much worse the sentiment around Anthropic is getting
primaprashant
Based on both performance vs price charts, it seems using Opus 4.8 with med effort is almost a better choice than using Sonnet 5 at xhigh effort
scottfits
> the computer use evaluation OSWorld-Verified. Sonnet 5 (orange line) is a strict improvement over Sonnet 4.6cool to see, still waiting for models to get better at computer use.
SoKamil
I believe that’s gonna be meta for agentic coding this year for enterprises. Cost optimized models approaching SOTA capabilities on software engineering but without cybersec training.
beernet
Anthropic's run on the model and product side of things is highly impressive. They got Sam A. punching the air consistently, which is well-deserved and self-inflicted above all.
anon
undefined
jerrygoyal
It's actually a huge update for building products, given most tasks are sub-agent driven where Sonnet is used, steered by Opus.
docproof
The jump in reasoning quality is noticeable. What's interesting is how it handles ambiguous instructions now — it seems to ask fewer clarifying questions and just makes a reasonable judgment call. That's a double-edged sword depending on your use case.
mellosty
Sonnet seems to be really expensive
baalimago
Not looking great for an upcoming IPO
benjiro29
Anybody notice that they did not include Sonnet 5 Max in the "Agentic Search results", when comparing to Opus 4.8 ...Based upon the "Agentic Computer usage", Sonnet 5 Max was going to be off "Agentic Search results" chart. lol ...In short, Sonnet 5 Low/Medium is more cost efficient, if its a task below Opus 4.8 Medium. For the rest its expensive and your better off using Opus 4.8.Why even release this model?
mellosty
It does not pass the "I want to wash my car, should I drive or walk"
smallerfish
Ah that's why Opus has been so slow for the last couple of days.
prmph
So many things to think about regarding these "benchmarks":- Do the ever increasing scores on the mean we will soon have models that approach 100%? And what would that even mean? That there is no more room for improvement?- Would Anthropic (or any other model vendor for that matter) ever release a newer model that scores lower? If not, does that mean they keep tweaking a new model they want to release until it shows an improvement of the prior model?- Would it be more useful to move toward a comparative rather than absolute ranking?
guelo
Have they ever said what the difference is between Sonnet and Opus? Are they trained differently? Different architectures? Is Sonnet a distillation? Is it just that Sonnet has less resources for inference?None of the other labs are doing this kind of long lived two model series.
artursapek
I run a proofreading benchmark that tests how well models can find and fix errors in English text. They get several passes in a simple agent loop. Sonnet 5 is definitely better than Sonnet 4.6, but inferior on both quality and cost to GLM 5.1, GLM 5.2, Gemini 3.1 Flash, and Gemini 3.1 Pro. https://revise.io/errata-bench
ai_fry_ur_brain
Finally a model release where everyone is realising the scam. The world is healing (maybe).
joaohaas
Important to note that the cost graphs are heavily distorted. The agentic serch one for example is divided into 3 'columns': $0-$2, $2-$5 and $5-$10.And yet, the $2-$5 section is the widest, even though it only contains a single point.I can't even say if this is making the product look better or not, but it sure is weird. Maybe Claude just hallucinated those splits xD
tensegrist
there was a vibecoded prediction market–style page that was put up yesterday (?) that got the date exactly right i think
PeterStuer
Anyone else feel like Opus 4.8 got significantly dumber over the last 2 weeks?
kvetching
GLM 5.2 is better and cheaper. Maybe they are trying to embarrass Trump by making it look like we are losing to China.
Scroll_Swe
I don't pay so I'm glad for the upgrade. I usually use Gemini, Mistral Le Chat (Vibe...) or Deepseek as they have way more generous free limits and I can basically spam forever.
docheinestages
Is it just me or is there a huge difference between how much one can accomplish in a 5-hour window with GPT 5.5 on xhigh versus any Claude model?
jchw
American AI company status: We are now bragging about how bad our models are unironically.Okay.
_pdp_
Too expensive?
gverrilla
Is this the default model for non-paying users? If so, that could be an interesting move in the competition for this segment.
andrewchambers
The whole fable fiasco really soured me on Anthropic. This just looks disappointing by comparison.
ekjhgkejhgk
In effective terms they're lowering prices.
micromacrofoot
So they repackaged Fable and added "don't scare the government" to the prompt
Getchowned
Fable soon please.
anon
undefined
anon
undefined
moomin
I feel like this is a bit of a disappointment. Sonnet 4 was a clear step above Opus 3.x, while this is a lot muddier.
mesmertech
Ok thats a one month clock to the next Opus model at least, so thats a silver lining to a meh model.
stackedinserter
"Our new model is proudly dumber now!"
varispeed
What is the point if it is one Trump's brain fart away from being blocked?
Danii27
[flagged]
justicehunter
[dead]
aykutseker
[dead]
ricardobeat
[dead]
lucynight
AMAZING