Need help?
<- Back

Comments (173)

  • bgirard
    > Using the develop web game skill and preselected, generic follow-up prompts like "fix the bug" or "improve the game", GPT‑5.3-Codex iterated on the games autonomously over millions of tokens.I wish they would share the full conversation, token counts and more. I'd like to have a better sense of how they normalize these comparisons across version. Is this a 3-prompt 10m token game? a 30-prompt 100m token game? Are both models using similar prompts/token counts?I vibe coded a small factorio web clone [1] that got pretty far using the models from last summer. I'd love to compare against this.[1] https://factory-gpt.vercel.app/
  • granzymes
    I think Anthropic rushed out the release before 10am this morning to avoid having to put in comparisons to GPT-5.3-codex!The new Opus 4.6 scores 65.4 on Terminal-Bench 2.0, up from 64.7 from GPT-5.2-codex.GPT-5.3-codex scores 77.3.
  • itay-maman
    Something that caught my eye from the announcement:> GPT‑5.3‑Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own trainingI'm happy to see the Codex team moving to this kind of dogfooding. I think this was critical for Claude Code to achieve its momentum.
  • xiphias2
    ,,GPT‑5.3-Codex is the first model we classify as High capability for cybersecurity-related tasks under our Preparedness Framework , and the first we’ve directly trained to identify software vulnerabilities. While we don’t have definitive evidence it can automate cyber attacks end-to-end, we’re taking a precautionary approach and deploying our most comprehensive cybersecurity safety stack to date. Our mitigations include safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines including threat intelligence.''While I love Codex and believe it's amazing tool, I believe their preparedness framework is out of date. As it is more and more capable of vibe coding complex apps, it's getting clear that the main security issues will come up by having more and more security critical software vibe coded.It's great to look at systems written by humans and how well Codex can be used against software written by humans, but it's getting more important to measure the opposite: how well humans (or their own software) are able to infiltrate complex systems written mostly by Codex, and get better on that scale.In simpler terms: Codex should write secure software by default.
  • minimaxir
    I remember when AI labs coordinated so they didn't push major announcements on the same day to avoid cannibalizing each other. Now we have AI labs pushing major announcements within 30 minutes.
  • PieUser
    How'd they both release at the same time? Insiders?
  • tosh
    Terminal Bench 2.0 | Name | Score | |---------------------|-------| | OpenAI Codex 5.3 | 77.3 | | Anthropic Opus 4.6 | 65.4 |
  • nananana9
    I've been listening to the insane 100x productivity gains you all are getting with AI and "this new crazy model is a real game changer" for a few years now, I think it's about time I asked:Can you guys point me ton a single useful, majority LLM-written, preferably reliable, program that solves a non-trivial problem that hasn't been solved before a bunch of times in publicly available code?
  • dawidg81
    May AI not write the code for me.May I at least understand what it has "written". AI help is good but don't replace real programmers completely. I'm enough copy pasting code i don't understand. What if one day AI will fall down and there will be no real programmers to write the software. AI for help is good but I don't want AI to write whole files into my project. Then something may broke and I won't know what's broken. I've experienced it many times already. Told the AI to write something for me. The code was not working at all. It was compiling normally but the program was bugged. Or when I was making some bigger project with ChatGPT only, it was mostly working but after a longer time when I was promting more and more things, everything got broken.
  • trilogic
    When 2 multi billion giants advertise same day, it is not competition but rather a sign of struggle and survival. With all the power of the "best artificial intelligence" at your disposition, and a lot of capital also all the brilliant minds, THIS IS WHAT YOU COULD COME UP WITH?Interesting
  • karmasimida
    For those who cared:GPT-5.3-Codex dominates terminal coding with a roughly 12% lead (Terminal-Bench 2.0), while Opus 4.6 retains the edge in general computer use by 8% (OSWorld).Anyone knows the difference between OSWorld vs OSWorld Verified?
  • morleytj
    The behind the scenes on deciding when to release these models has got to be pretty insanely stressful if they're coming out within 30 minutes-ish of each other.
  • ffitch
    > our team was blown away > by how much Codex was able > to accelerate its own developmentthey forgot to add “Can’t wait to see what you do with it”
  • modeless
    It's so difficult to compare these models because they're not running the same set of evals. I think literally the only eval variant that was reported for both Opus 4.6 and GPT-5.3-Codex is Terminal-Bench 2.0, with Opus 4.6 at 65.4% and GPT-5.3-Codex at 77.3%. None of the other evals were identical, so the numbers for them are not comparable.
  • foft
    Having used codex a fair bit I find it really struggles with … almost anything. However using the equivalent chat gpt model is fantastic. I guess it’s a matter of focus and being provided with a smaller set of code to tackle.
  • prng2021
    Did they post the knowledge cutoff date somewhere
  • ponyous
    I think models are smart enough for most of the stuff, these little incremental changes barely matter now. What I want is the model that is fast.
  • jdthedisciple
    Gotta love how the game demo's page title is "threejs" – I guess the point was to demo its vibe-coding abilities anyway, but yea..
  • fishpham
  • __mharrison__
    I never really used Codex (found it to slow) just 5.2, which I going to be an excellent model for my work. This looks like another step up.This week, I'm all local though, playing with opencode and running qwen3 coder next on my little spark machine. With the way these local models are progressing, I might move all my llm work locally.
  • tyfon
    I'm having a hard time parsing the openai website.Anyone know if it is possible to use this model with opencode with the plus subscription?
  • gwd
    gpt-5.3-codex isn't available on the API yet. From TFA:> We are working to safely enable API access soon.
  • Robin_f
    Anthropic mostly had an advantage in speed. It feels like with a 25% increase in speed with Codex 5.3, they are now losing that advantage as well.
  • GenerWork
    I find it very, very interesting how they demoed visuals in the form of the “soft SaaS” website and mentioned how it can do user research. Codex has usually lagged behind Claude and Gemini when it comes to UX, so I’m curious to see if 5.3 will take the lead in real world use. Perhaps it’ll be available in Figma Make now?
  • rustyhancock
    Anyone remember the dot-com era when you would see one provider claim the most miles of fibre and then later that week another would have the title?
  • davidmurdoch
    I've been using 5.2 the way they're describing the new use case for 5.3 this whole time.
  • imasliev
    GPT-5.2-Codex was so cool at price/value rate, hope 5.3 will not ruin the race with claude
  • ecshafer
    Funny that this and Opus 4.6 released within minutes of each other. Each showing similar score improvements. Each claiming to be revolutionary.
  • kingstnap
    That was fast!I really do wonder whats the chain here. Did Sam see the Opus announcement and DM someone a minute later?
  • bryanhogan
    The most important question: Can it do Svelte now?
  • kingstnap
    > GPT‑5.3-Codex was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems. We are grateful to NVIDIA for their partnership.This is hilarious lol
  • I_am_tiberius
    I'd like to know if and how much illegal use of customer prompts are used for training.
  • bg24
    I am on a max subscription for Claude, and hate the fact that OpenAI have not figured out that $20 => $200 is a big jump. Good luck to them. In terms of model, just last night, Codex 5.2 solved a problem for me which other models were going round and round. Almost same instructions. That said, I still plan to be on $100 Claude (overall value across many tasks, ability to create docs, co-work), and may bump up OpenAI subscription to the next tier should they decide to introduce one. Not going to $200 even with 5.3, unless my company pays for it.
  • binsquare
    At first try it solved a problem that 5.2 couldn't previously.Seems to be slower/thinks longer.
  • anon
    undefined
  • anon
    undefined
  • simianwords
    Any notes on pricing?
  • kopollo
    Where is the google?
  • maheshrijal
    It seems Fast!
  • roya51788
    what are the benchmarks against opus 4.6?
  • edem
    So can I use this from Opencode? Because Anthropic started to enforce their TOS to kill the Opencode integration
  • hubraumhugo
    Anybody else not seeing it available in Codex app or CLI yet (with Plus)?
  • heraldgeezer
    Anthropic and GTP 2 new models at once?
  • wahnfrieden
    Pelican seems much worse than the Opus 4.6 one (though the bicycle is more accurate):https://gist.github.com/simonw/a6806ce41b4c721e240a4548ecdbe...
  • OutOfHere
    It is absurd to release 5.3-Codex before first releasing 5.3.Also, there is no reason for OpenAI and Anthropic to be trying to one-up each other's releases on the same day. It is hell for the reader.
  • raincole
    Almost like Anthropic and OpenAI are trying to front run each other
  • copilot_king
    [dead]
  • copilot_king_2
    [flagged]
  • mannanj
    Stolen from the Opus 4.6 thread:GPT-5.3-Codex was so good it became my wife!
  • maxpert
    Is this me or Sam is being absolute sore loser he is and trying to steal Opus thunder?
  • shibeprime
    I know we just got a reset and a 2× bump with the native app release, but shipping 5.3 with no reset feels mismatched. If I’d known this was coming, I wouldn’t have used up the quota on the previous model.