GLM-5.2 is a step change for open agents

<- Back

GLM-5.2 is a step change for open agents

vantareed

Comments (86)

jerojero
Open weight models from Chinese labs tend to be significantly cheaper.I think theyre absolutely needed. I can't afford 200 USD a month for personal use of coding AI, and I don't think such prices are reasonable for most of the world economy anyway. Not to mention US firms might be giving their employees a lot more than that.It's increasingly feeling, to me, that theres a gap building up between haves and have nots. But then, we get news of these open weight models that are reasonably priced in inference with reasonable capabilities. Yes, they take maybe 6-9 months to get there, tbh, that's not a bad trade off at all.
christophilus
I've been working with Deepseek V4 Flash (with opencode as the harness). It's been almost indistinguishable from Codex / Claude Code for me. I'm sure I'll run into problems when I get to a stickier ticket to tackle. But so far, it's been quite good, and I find it writes straightforward code.I do think the Chinese models are good enough for an 80/20 rule use case.
aunty_helen
I signed up to a z.ai max account, $144. Hardly been able to use it as it 429s on most requests. They’re also refusing to refund me.
guybedo
GLM-5.2 has been a step change in how fast i can burn through tokens.I subscribed to their max plan to try it out. It counted me 700M tokens and drained my weekly quota in under 2 days.Quota just reset less than 24h ago and i'm already >60% weekly quota usage.For reference the kind of work i did would have used somewhere between 3% and 5% of Codex max or Claude max.The model is good, the plan is a scam
fraywing
It feels like the gap is closing from an intelligence perspective. Or at least doing some kind of log flattening.Been playing with GLM 5.2 in different contexts. It's less good if you don't max out thinking, but as xhigh it's been able to solve most problems I was throwing at Opus in the about the same amount of time (via OpenRouter).Wild time to be alive.
mlmonkey
Here are the numbers from their bar chart: 1. SWE-bench Pro Model Score (%) GLM-5.2 62.1 GLM-5.1 58.4 Claude Opus 4.8 69.2 GPT-5.5 58.6 Gemini 3.1 Pro 54.2 2. Terminal-Bench 2.1 Model Score (%) GLM-5.2 81.0 GLM-5.1 63.5 Claude Opus 4.8 85.0 GPT-5.5 84.0 Gemini 3.1 Pro 74.0 3. NL2Repo Model Score (%) GLM-5.2 48.9 GLM-5.1 42.7 Claude Opus 4.8 69.7 GPT-5.5 50.7 Gemini 3.1 Pro 33.4 4. DeepSWE Model Score (%) GLM-5.2 46.2 GLM-5.1 18.0 Claude Opus 4.8 58.0 GPT-5.5 70.0 Gemini 3.1 Pro 10.0 5. ProgramBench Model Score (%) GLM-5.2 63.7 GLM-5.1 50.9 Claude Opus 4.8 71.9 GPT-5.5 70.8 Gemini 3.1 Pro 39.5 6. MCP-Atlas Model Score (%) GLM-5.2 77.0 GLM-5.1 71.8 Claude Opus 4.8 77.8 GPT-5.5 75.3 Gemini 3.1 Pro 69.2 7. Tool-Decathlon Model Score (%) GLM-5.2 48.2 GLM-5.1 40.7 Claude Opus 4.8 59.9 GPT-5.5 55.6 Gemini 3.1 Pro 48.8 8. Humanity's Last Exam Model Base Score (%) Score w/ Tools (%) GLM-5.2 40.5 54.7 GLM-5.1 31.0 52.3 Claude Opus 4.8 49.8 57.9 GPT-5.5 41.4 52.2 Gemini 3.1 Pro 45.0 51.4 Seems to be handily beating Gemini 3.1 Pro. What _is_ Google DeepMind doing (other than bleeding talent to A\ ) ?
timcobb
Can people share their GLM and open model setups in general please? What provider do you use. Why do you trust it with serving full quality? What harness do you use? Why do you trust it not to have malware (most harnessed are TS apps). I am just trying GLM 5.1 from Nvidia build in open code would love to hear how you all do it, thanks.
neosat
I've been using GLM 5.2 recently (company hosted, for non-coding tasks) and it's been strong and reliable. There are areas where GPT 5.5 and Opus 4.x still feel marginally better but only marginally. For most tasks if GLM 5.2 is the only model I have to use I'm productive and happy. This was not true before GLM 5.2. No doubt in my mind that the gap is closing quickly and for most tasks that are not very specialized open models will be usably on par on flagship closed models and have an edge factoring in cost.For coding I still use 5.5 w/ Codex and prefer that to other models + harness combinations.
yogthos
It's by far the most competent open model I've tried yet. It's a bit slower than Claude, but in terms of coding capability it seems to get comparable results at least for the work I'm doing.
themgt
I just tested GLM 5.2 out via Z.ai in pi for a little one-off project that was already scoped. It actually did a relatively decent job starting out, and figured important things out from context.But the reasoning traces became increasingly hilarious, with it getting confused and going in loops, doubting itself. I began to feel almost sad, it was like listening to the internal monologue of someone with anxiety disorder.It made pretty good progress but wound up going in a lot of goofy loops and doing things a bit "off" from standards I'd hoped it would infer, and finally started going a bit nuts, "This is very confusing.", "OH WAIT", seemingly hallucinating a whole side-quest that didn't make sense and looking at making internal system changes to try to achieve its (now very confused) goal when I pulled the plug.Without seeing the reasoning traces from Claude/GPT it's hard to really know, but it definitely didn't feel like the same quality of reasoning, even if dogged persistence does wind up actually working eventually.
anon
undefined
newaccountman2
5.1 and Qwen 3.6 are great too IMO
seany
What's the current best for ablation? Specifically chemistry and red-team/netsec?
dools
Is z.aiIs 2 better than x.ai
citizenpaul
Ive been using glm5 since its release and still prefer it to glm5.1 and so far to glm5.2Perhaps it is just my harness and workflow, but the older model still seems to work better. Also the token cost is significantly lower. I rarely spend more than $20 a week with $50 cap. Not even half claudes ambiguous minimum $200 a month plan.
modgate
[flagged]
Balinares
I can't help wondering what kind of models we'll see coming out of China once it gets its own chip fabs up and running. Right now it sounds like the US's export ban is not slowing them down a whole lot.