<- Back
Comments (123)
- TiberiumI checked the current speed over the API, and so far I'm very impressed. Of course models are usually not as loaded on the release day, but right now:- Older GPT-5 Mini is about 55-60 tokens/s on API normally, 115-120 t/s when used with service_tier="priority" (2x cost).- GPT-5.4 Mini averages about 180-190 t/s on API. Priority does nothing for it currently.- GPT-5.4 Nano is at about 200 t/s.To put this into perspective, Gemini 3 Flash is about 130 t/s on Gemini API and about 120 t/s on Vertex.This is raw tokens/s for all models, it doesn't exclude reasoning tokens, but I ran models with none/minimal effort where supported.And quick price comparisons:- Claude: Opus 4.6 is $5/$25, Sonnet 4.6 is $3/$15, Haiku 4.5 is $1/$5- GPT: 5.4 is $2.5/$15 ($5/$22.5 for >200K context), 5.4 Mini is $0.75/$4.5, 5.4 Nano is $0.2/$1.25- Gemini: 3.1 Pro is $2/$12 ($3/$18 for >200K context), 3 Flash is $0.5/$3, 3.1 Flash Lite is $0.25/$1.5
- simonwHere's a grid of pelicans for the different models and reasoning levels: https://static.simonwillison.net/static/2026/gpt-5.4-pelican...
- pscanfI quite like the GPT models when chatting with them (in fact, they're probably my favorites), but for agentic work I only had bad experiences with them.They're incredibly slow (via official API or openrouter), but most of all they seem not to understand the instructions that I give them. I'm sure I'm _holding them wrong_, in the sense that I'm not tailoring my prompt for them, but most other models don't have problem with the exact same prompt.Does anybody else have a similar experience?
- BoumTACTo me, mini releases matter much more and better reflect the real progress than SOTA models.The frontier models have become so good that it's getting almost impossible to notice meaningful differences between them.Meanwhile, when a smaller / less powerful model releases a new version, the jump in quality is often massive, to the point where we can now use them 100% of the time in many cases.And since they're also getting dramatically cheaper, it's becoming increasingly compelling to actually run these models in real-life applications.
- mikkelamWhy are we treating LLM evaluation like a vibe check rather than an engineering problem?Most "Model X > Model Y" takes on HN these days (and everywhere) seem based on an hour of unscientific manual prompting. Are we actually running rigorous, version-controlled evals, or just making architectural decisions based on whether a model nailed a regex on the first try this morning?
- HugoDiasAccording to their benchmarks, GPT 5.4 Nano > GPT-5-mini in most areas, but I'm noticing models are getting more expensive and not actually getting cheaper?GPT 5 mini: Input $0.25 / Output $2.00GPT 5 nano: Input: $0.05 / Output $0.40GPT 5.4 mini: Input $0.75 / Output $4.50GPT 5.4 nano: Input $0.20 / Output $1.25
- ibrahim_hThe OSWorld numbers are kinda getting lost in the pricing discussion but imo that's the most interesting part. Mini at 72.1% vs 72.4% human baseline is basically noise, so why not just use mini by default unless you're hitting specific failure modes.Also context bleed into nano subagents in multi-model pipelines — I've seen orchestrators that just forward the entire message history by default (or something like messages[-N:] without any real budgeting), so your "cheap" extraction step suddenly runs with 30-50K tokens of irrelevant context. And then what's even the point, you've eaten the latency/cost win and added truncation risk on top.Has anyone actually measured where that cutoff is in practice? At what context size nano stops being meaningfully cheaper/faster in real pipelines, not benchmarks.
- XCSmeIt's odd, that on many benchmarks, including mine[0], Nano does better than Mini.5.4 mini seems to struggle with consistency, and even with temperature 0 sometimes gives the correct response, sometimes a wrong one...[0]: https://aibenchy.com/compare/openai-gpt-5-4-medium/openai-gp...
- michaelgdwnThe Nano tier is the one I'm watching. For agent workflows where you're making dozens of LLM calls per task, the cost per call matters more than peak capability. Would be interesting to see benchmarks on function calling latency specifically — that's what matters for agents.
- technocrat80805.4 Mini's OSWorld score is a pleasant surprise. When SOTA scores were still ~30-40 models were too slow and inaccurate for realtime computer use agents (rip Operator/Agent). Curious if anyone's been using these in production.
- cbg0Based on the SWE-Bench it seems like 5.4 mini high is ~= GPT 5.4 low in terms of accuracy and price but the latency for mini is considerably higher at 254 seconds vs 171 seconds for GPT5.4. Probably a good option to run at lower effort levels to keep costs down for simpler tasks. Long context performance is also not great.
- tintorSeveral customer testimonials for GPT-5.4 Mini have em dashes in them.Did GPT write them?
- nicpottierI've been struggling on finding a reasonably priced model to use with my toy openclaw instance. Opus 4.6 felt kinda magical but that's just too expensive and I'm not risking my max subscription for it.GPT 5.4 mini is the first alternative that is both affordable and decent. Pretty impressed. On a $20 codex plan I think I'm pretty set and the value is there for me.
- RapzidOh.. I thought maybe these would be upgrades to gpt-4.1 and gpt-4.1-mini and etc.. But the latency is way too high compared to the 400-600. Yeah, different models and etc but the naming is confusing.
- fastpdfaiOne thing I really want to find out, is which model and how to process TONS of pdfs very very fast, and very accurate. For prediction of invoice date, accrual accounting and other accounting related purposes. So a decent smart model that is really good at pdf and image reading. While still being very very fast.
- bekleinAs a big Codex user, with many smaller requests, this one is the highlight: "In Codex, GPT‑5.4 mini is available across the Codex app, CLI, IDE extension and web. It uses only 30% of the GPT‑5.4 quota, letting developers quickly handle simpler coding tasks in Codex for about one-third the cost." + Subagents support will be huge.
- ryaoI will be impressed when they release the weights for these and older models as open source. Until then, this is not that interesting.
- dacki want 5.4 nano to decide whether my prompt needs 5.4 xhigh and route to it automatically
- jbellisBenchmarking these now.Preregistering my predictions:Mini: better than Haiku but not as good as Flash 3, especially at reasoning=none.Nano: worse than Flash 3 Lite. Probably better than Qwen 3.5 27b.
- 6thbitLooking at the long context benchmark results for these, sounds like they are best fit for also mini-sized context windows.Is there any harness with an easy way to pick a model for a subagent based on the required context size the subagent may need?
- bananamogulThey could call them something like “sonnet” and “haiki” maybe.
- poweraI've been waiting for this update.For many "simple" LLM tasks, GPT-5-mini was sufficient 99% of the time. Hopefully these models will do even more and closer to 100% accuracy.The prices are up 2-4x compared to GPT-5-mini and nano. Were those models just loss leaders, or are these substantially larger/better?
- kseniamorphwow, not bad result on the computer use benchmark for the mini model. for example, Claude Sonnet 4.6 shows 72.5%, almost on par with GPT-5.4 mini (72.1%). but sonnet costs 4x more on input and 3x more on output
- simianwordswhy isn't nano available in codex? could be used for ingesting huge amount of logs and other such things
- yomismoaquiNot comparing with equivalent models from Anthropic or Google, interesting...
- machinecontrolWhat's the practical advantage of using a mini or nano model versus the standard GPT model?
- casey2I googled all the testimonial names and they are all linked-in mouthpieces.
- derefrOpenAI don't talk about the "size" or "weights" of these models any more. Anyone have any insight into how resource-intensive these Mini/Nano-variant models actually are at this point?I assume that OpenAI continue to use words like "mini" and "nano" in the names of these model variants, to imply that they reserve the smallest possible resource-units of their inference clusters... but, given OpenAI's scale, that may well be "one B200" at this point, rather than anything consumers (or even most companies) could afford.I ask because I'm curious whether the economics of these models' use-cases and call frequency work out (both from the customer perspective, and from OpenAI's perspective) in favor of OpenAI actually hosting inference on these models themselves, vs. it being better if customers (esp. enterprise customers) could instead license these models to run on-prem as black-box software appliances.But of course, that question is only interesting / only has a non-trivial answer, if these models are small enough that it's actually possible to run them on hardware that costs less to acquire than a year's querying quota for the hosted version.
- varispeedI stopped paying attention to GPT-5.x releases, they seem to have been severely dumbed down.
- morpheos137i switched to claude when i found chatgpt would argue with just about anything I said even when it was wrong. they have over optimised antisychophancy. i want a model that simulates critical thinking not one that repeats half baked often incomplete dogmas. the chatgpt 5x range is extraordinarily powerful but also extra ordinarily frustrating to try to use for anything creative or productive that is original in my opinion. claude basically is able to think critically while being neither sycophantic or argumentative most of the time in my option with appropriate user prompting. recent chat gpts seem to fight me every step of the way when not doing boiler plate. i don't want to waste my time fighting a tool.
- reconnectingAll three ChatGPT models (Instant, Thinking, and Pro) have a new knowledge cutoff of August 2025.Seriously?
- system2I am feeling the version fatigue. I cannot deal with their incremental bs versions.
- miltonlostDoes it still help drive people to psychosis and murder and suicide? Where's the benchmark for that?
- beernetCrazy how OAI is way behind now and the only one to blame is Sam, his ego and lust for influence. Their downwards trajectory of paying accounts since "the move" (DoW deal) is an open secret. If you had placed a new CEO at OAI six months ago and told him to destroy the company, it would have been hard for that CEO to do a better job at that than Sam did. Should have left when he was let go but decided to go full Greg and MAGA instead. Here we are. Go Dario