<- Back
Comments (300)
- minimaxirThe marquee feature is obviously the 1M context window, compared to the ~200k other models support with maybe an extra cost for generations beyond >200k tokens. Per the pricing page, there is no additional cost for tokens beyond 200k: https://openai.com/api/pricing/Also per pricing, GPT-5.4 ($2.50/M input, $15/M output) is much cheaper than Opus 4.6 ($5/M input, $25/M output) and Opus has a penalty for its beta >200k context window.I am skeptical whether the 1M context window will provide material gains as current Codex/Opus show weaknesses as its context window is mostly full, but we'll see.Per updated docs (https://developers.openai.com/api/docs/guides/latest-model), it supercedes GPT-5.3-Codex, which is an interesting move.
- creamyhorrorI've only used 5.4 for 1 prompt (edit: 3@high now) so far (reasoning: extra high, took really long), and it was to analyse my codebase and write an evaluation on a topic. But I found its writing and analysis thoughtful, precise, and surprisingly clearly written, unlike 5.3-Codex. It feels very lucid and uses human phrasing.It might be my AGENTS.md requiring clearer, simpler language, but at least 5.4's doing a good job of following the guidelines. 5.3-Codex wasn't so great at simple, clear writing.
- kgeist>Today, we’re releasing <..> GPT‑5.3 Instant>Today, we’re releasing GPT‑5.4 in ChatGPT (as GPT‑5.4 Thinking),>Note that there is not a model named GPT‑5.3 ThinkingThey held out for eight months without a confusing versioning scheme :)
- elmeanWow insane improvements in targeting systems for military targets over children
- Chance-DeviceI’m sure the military and security services will enjoy it.
- smoody07Surprised to see every chart limited to comparisons against other OpenAI models. What does the industry comparison look like?
- gavinrayThe "RPG Game" example on the blogpost is one of the most impressive demo's of autonomous engineering I've seen.It's very similar to "Battle Brothers", and the fact that RPG games require art assets, AI for enemy moves, and a host of other logical systems makes it all the more impressive.
- hmokiguessThey hired the dude from OpenClaw, they had Jony Ive for a while now, give us something different!
- egonschieleThe actual card is here https://deploymentsafety.openai.com/gpt-5-4-thinking/introdu... the link currently goes to the announcement.
- mattas"GPT‑5.4 interprets screenshots of a browser interface and interacts with UI elements through coordinate-based clicking to send emails and schedule a calendar event."They show an example of 5.4 clicking around in Gmail to send an email.I still think this is the wrong interface to be interacting with the internet. Why not use Gmail APIs? No need to do any screenshot interpretation or coordinate-based clicking.
- yanis_tThese releases are lacking something. Yes, they optimised for benchmarks, but it’s just not all that impressive anymore. It is time for a product, not for a marginally improved model.
- twtw99If you don't want to click in, easy comparison with other 2 frontier models - https://x.com/OpenAI/status/2029620619743219811?s=20
- nickysielickican anyone compare the $200/mo codex usage limits with the $200/mo claude usage limits? It’s extremely difficult to get a feel for whether switching between the two is going to result in hitting limits more or less often, and it’s difficult to find discussion online about this.In practice, if I buy $200/mo codex, can I basically run 3 codex instances simultaneously in tmux, like I can with claude code pro max, all day every day, without hitting limits?
- timpera> Steerability: Similarly to how Codex outlines its approach when it starts working, GPT‑5.4 Thinking in ChatGPT will now outline its work with a preamble for longer, more complex queries. You can also add instructions or adjust its direction mid-response.This was definitely missing before, and a frustrating difference when switching between ChatGPT and Codex. Great addition.
- denysvitaliArticle: https://openai.com/index/introducing-gpt-5-4/gpt-5.4Input: $2.50 /M tokensCached: $0.25 /M tokensOutput: $15 /M tokens---gpt-5.4-proInput: $30 /M tokensOutput: $180 /M tokensWtf
- jryio1 million tokens is great until you notice the long context scores fall off a cliff past 256K and the rest is basically vibes and auto compacting.
- ZeroCool2uBit concerning that we see in some cases significantly worse results when enabling thinking. Especially for Math, but also in the browser agent benchmark.Not sure if this is more concerning for the test time compute paradigm or the underlying model itself.Maybe I'm misunderstanding something though? I'm assuming 5.4 and 5.4 Thinking are the same underlying model and that's not just marketing.
- nickandbroBeat Simon Willison ;)https://www.svgviewer.dev/s/gAa69yQdNot the best pelican compared to gemini 3.1 pro, but I am sure with coding or excel does remarkably better given those are part of its measured benchmarks.
- prydtI no longer want to support OpenAI at all. Regardless of benchmarks or real world performance.
- jstummbilligInline poll: What reasoning levels do you work with?This becomes increasingly less clear to me, because the more interesting work will be the agent going off for 30mins+ on high / extra high (it's mostly one of the two), and that's a long time to wait and an unfeasible amount of code to a/b
- dandiepAnyone know why OpenAI hasn't released a new model for fine tuning since 4.1? It'll be a year next month since their last model update for fine tuning.
- XCSmeSeems to be quite similar to 5.3-codex, but somehow almost 2x more expensive: https://aibenchy.com/compare/openai-gpt-5-4-medium/openai-gp...
- rbitarI think the most exciting change announced here is the use of tool search to dynamically load tools as needed: https://developers.openai.com/api/docs/guides/tools-tool-sea...
- jcmontx5.4 vs 5.3-Codex? Which one is better for coding?
- paxys"Here's a brand new state-of-the-art model. It costs 10x more than the previous one because it's just so good. But don't worry, if you don't want all this power you can continue to use the older one."A couple months later:"We are deprecating the older model."
- bob1029I was just testing this with my unity automation tool and the performance uplift from 5.2 seems to be substantial.
- daft_pinkI’ve officially got model fatigue. I don’t care anymore.
- bazmattazAnyone else feel that it’s exhausting keeping up with the pace of new model releases. I swear every other week there’s a new release!
- anonundefined
- 7777777phil83% win rate over industry professionals across 44 occupations.I'd believe it on those specific tasks. Near-universal adoption in software still hasn't moved DORA metrics. The model gets better every release. The output doesn't follow. Just had a closer look on those productivity metrics this week: https://philippdubach.com/posts/93-of-developers-use-ai-codi...
- strongpigeonIt's interesting that they charge more for the > 200k token window, but the benchmark score seems to go down significantly past that. That's judging from the Long Context benchmark score they posted, but perhaps I'm misunderstanding what that implies.
- cjI use ChatGPT primarily for health related prompts. Looking at bloodwork, playing doctor for diagnosing minor aches/pains from weightlifting, etc.Interesting, the "Health" category seems to report worse performance compared to 5.2.
- iamronaldoNotably 75% on os world surpassing humans at 72%... (How well models use operating systems)
- swingboyEven with the 1m context window, it looks like these models drop off significantly at about 256k. Hopefully improving that is a high priority for 2026.
- nthypes$30/M Input and $180/M Output Tokens is nuts. Ridiculous expensive for not that great bump on intelligence when compared to other models.
- alpinemanNo thanks. Already cancelled my sub.
- OsrsNeedsf2PDoes anyone know what website is the "Isometric Park Builder" shown off here?
- motbus3Sam Altman can keep his model intentionallybto himself. Not doing business with mass murderers
- vicchenaiHonestly at this point I just want to know if it follows complex instructions better than 5.1. The benchmark numbers stopped meaning much to me a while ago - real usage always feels different.
- koakuma-chanAnyone else getting artifacts when using this model in Cursor?numerusformassistant to=functions.ReadFile մեկնաբանություն 天天爱彩票网站json {"path":
- oytisEveryone is mindblown in 3...2...1
- jesse_dot_idChatMDK
- ilakshRemember when everyone was predicting that GPT-5 would take over the planet?
- OutOfHereWhat is with the absurdity of skipping "5.3 Thinking"?
- lostmsuWhat is Pro exactly and is it available in Codex CLI?
- anonundefined
- HardCodedBiasWe'll have to wait a day or two, maybe a week or two, to determine if this is more capable in coding than 5.3, which seems to be the economically valuable capability at this time.In terms of writing and research even Gemini, with a good prompt, is close to useable. That's likely not a differentiator.
- wahnfriedenNo Codex model yet
- tmpz22Does this improve Tomahawk Missile accuracy?
- world2vecBenchmarks barely improved it seems
- ignorantguyit shows a 404 as of now.
- simianwordsWhat is the point of gpt codex?
- iamleppertI wouldn't trust any of these benchmarks unless they are accompanied by some sort of proof other than "trust me bro". Also not including the parameters the models were run at (especially the other models) makes it hard to form fair comparisons. They need to publish, at minimum, the code and runner used to complete the benchmarks and logs.Not including the Chinese models is also obviously done to make it appear like they aren't as cooked as they really are.
- minimaxirMore discussion here on the blog post announcement which has been confusingly penalized by Hacker News's algorithm: https://news.ycombinator.com/item?id=47265005
- jeff_antseed[dead]
- anonundefined
- shablulman[dead]
- chromic04850[dead]
- chromic04850[dead]
- leftbehinds[flagged]
- leftbehindssome sloppy improvements
- kotevcode[flagged]
- beernetSam really fumbled the top position in a matter of months, and spectacularly so. Wow. It appears that people are much more excited by Anthropic and Google releases, and there are good reasons for that which were absolutely avoidable.