GPT‑5.3‑Codex‑Spark

<- Back

GPT‑5.3‑Codex‑Spark

meetpateltech

Comments (181)

beklein
I love this! I use coding agents to generate web-based slide decks where “master slides” are just components, and we already have rules + assets to enforce corporate identity. With content + prompts, it’s straightforward to generate a clean, predefined presentation. What I’d really want on top is an “improv mode”: during the talk, I can branch off based on audience questions or small wording changes, and the system proposes (say) 3 candidate next slides in real time. I pick one, present it, then smoothly merge back into the main deck. Example: if I mention a recent news article / study / paper, it automatically generates a slide that includes a screenshot + a QR code link to the source, then routes me back to the original storyline. With realtime voice + realtime code generation, this could turn the boring old presenter view into something genuinely useful.
postalcoder
First thoughts using gpt-5.3-codex-spark in Codex CLI:Blazing fast but it definitely has a small model feel.It's tearing up bluey bench (my personal agent speed benchmark), which is a file system benchmark where I have the agent generate transcripts for untitled episodes of a season of bluey, perform a web search to find the episode descriptions, and then match the transcripts against the descriptions to generate file names and metadata for each episode.Downsides:- It has to be prompted to do actions in my media library AGENTS.md that the larger models adhere to without additional prompting.- It's less careful with how it handles context which means that its actions are less context efficient. Combine that with the smaller context window and I'm seeing frequent compactions. Bluey Bench* (minus transcription time): Codex CLI gpt-5.3-codex-spark low 20s gpt-5.3-codex-spark medium 41s gpt-5.3-codex-spark xhigh 1m 09s (1 compaction) gpt-5.3-codex low 1m 04s gpt-5.3-codex medium 1m 50s gpt-5.2 low 3m 04s gpt-5.2 medium 5m 20s Claude Code opus-4.6 (no thinking) 1m 04s Antigravity gemini-3-flash 1m 40s gemini-3-pro low 3m 39s *Season 2, 52 episodes
pjs_
Continue to believe that Cerebras is one of the most underrated companies of our time. It's a dinner-plate sized chip. It actually works. It's actually much faster than anything else for real workloads. Amazing
perdomon
This has been the industry standard for the last 20 minutes. I can't believe people are still using GPT-5.3-Codex.
jryio
This is interesting for offloading "tiered" workloads / priority queue with coding agents.If 60% of the work is "edit this file with this content", or "refactor according to this abstraction" then low latency - high token inference seems like a needed improvement.Recently someone made a Claude plugin to offload low-priority work to the Anthropic Batch API [1].Also I expect both Nvidia and Google to deploy custom silicon for inference [2]1: https://github.com/s2-streamstore/claude-batch-toolkit/blob/...2: https://www.tomshardware.com/tech-industry/semiconductors/nv...
nikkwong
> Our latest frontier models have shown particular strengths in their ability to do long-running tasks, working autonomously for hours, days or weeks without intervention.I have yet to see this (produce anything actually useful).
raahelb
Interesting to note that the reduced latency is not just due to the improved model speed, but also because of improvements made to the harness itself:> "As we trained Codex-Spark, it became apparent that model speed was just part of the equation for real-time collaboration—we also needed to reduce latency across the full request-response pipeline. We implemented end-to-end latency improvements in our harness that will benefit all models [...] Through the introduction of a persistent WebSocket connection and targeted optimizations inside of Responses API, we reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The WebSocket path is enabled for Codex-Spark by default and will become the default for all models soon."I wonder if all other harnesses (Claude Code, OpenCode, Cursor etc.,) can make similar improvements to reduce latency. I've been vibe coding (or doing agentic engineering) with Claude Code a lot for the last few days and I've had some tasks take as long as 30 minutes.
simonw
My stupid pelican benchmark proves to be genuinely quite useful here, you get a visual representation of the quality difference between GPT-5.3-Codex-Spark and full GPT-5.3-Codex: https://simonwillison.net/2026/Feb/12/codex-spark/
kachapopopow
Is this the first time one of the big 3 using Cerebras? I've been waiting for this day...
mbm
Works pretty well as a general-purpose computer. The speed is really enjoyable. Could replace some of my Claude Code use actually. For coding, set to xhigh and use it for personal tools or small projects.Example repo that Codex with spark made in about 15 minutes for me since `claude --resume` has been finicky lately: https://github.com/mzxrai/claude-sessions
mudkipdev
Off topic but how is it always this HN user sharing model releases within a couple of minutes of their announcement?
pdeva1
This is closer to 5.1 mini it seems and tied to Pro account. GLM 4.7 is available on-demand on Cerebras today [1] and performs better and cheaper... [1] https://www.cerebras.ai/blog/glm-4-7
alecco
This could probably work amazingly with an orchestrator on 5.3-high and coding agents with Spark. But it would need some decent instructions for both.
ttul
Great move by OpenAI. With coding agents, if you have access to a fast and cheap model, you can afford to let it rip, making lots of mistakes, and iterate until it gets things right. With the right scaffolding (AGENTS.md, SKILLS.md, etc.), a fast and light model can do great things. And when it's done, you can still have the heavyweight model come in to clean up any messes.
alexhans
When I saw Spark my mind went to Apache Spark and wondered if we were learning all the lessons in orchestration of driver/worker and data shuffling from that space.
anon
undefined
storus
Anyone using OpenClaw to manage a bunch of coding agents so that you only set the high-level vision and leave all the prompting, testing, debugging, forking to agents? If yes, how did you glue it all together? Are you using local models? What is the SOTA for what I can run locally with a 512GB M3 Ultra, 2x DGX Spark, 2x RTX Pro 6000 Max-Q in one machine and 1x RTX Pro 6000 WS in another machine?
antirez
The search for speed is vain. Often Claude Code Opus 4.6, on hard enough problems, can do the impression of acting fast without really making progresses because of lack of focus on what matters. Then you spin the much slower GPT 5.3-Codex and it fixes everything in 3 minutes of doing the right thing.
capevace
Seems like the industry is moving further towards having low-latency/high-speed models for direct interaction, and having slow, long thinking models for longer tasks / deeper thinking.Quick/Instant LLMs for human use (think UI). Slow, deep thinking LLMs for autonomous agents.
anon
undefined
OsrsNeedsf2P
No hint on pricing. I'm curious if faster is more expensive, given a slight trade-off in accuracy
hchak
Cerebras out here catching dubs. Does anyone know if Groq is running DGX Cloud inference or am I tripping?
Aeroi
open ai naming is a meme at this point
wxw
Great stuff. People are getting used to agents as the interface for everything, even work as simple as "change label X to label Y". More speed on that front is welcome. The Codex "blended mode" they refer to will be useful (similar to Claude Code bouncing between haiku and opus).I imagine it's a win-win. This could significantly help their tokenomics.The example showing a plan being generated instantaneously is interesting. Human understanding will end up as the last, true bottleneck.
dalemhurley
This is a win for agents, speed and intelligence is crucial to the loop. If the time and token cost is small you can iterate many times to correct mistakes.Got to wonder why Wall Street is dumping NVIDIA.
mynti
With the rough numbers from the blog post at ~1k tokens a second in Cerebras it should put it right at the same size as GLM 4.7, which also is available at 1k tokens a second. And they say that it is a smaller model than the normal Codex model
jannniii
This would be interesting if it was an open weights model.
rprend
Damn, this is the first thing to make me decide to try Codex, as a loyal Claude Code user.
cjbarber
It'll be nice when there's smarter routing between models, or easier routing, so some things get sent to the fast model, some get sent to the cheap model, some get sent to the smart model, etc.
modeless
Why are they obscuring the price? It must be outrageously expensive.
throwup238
Your move, Anthropic.(Yes I know they released /fast last week but I’m loving the constant oneupsmanship)
anonzzzies
Been using glm 4.7 for this with opencode. Works really well.
system2
I stopped using OpenAI tools recently after they increased the censorship. I can't even tell it to read a screencapture software I am building because it thinks I might use it for evil purposes.
desireco42
Is it not available in Codex? I think this is fantastic and can't wait to try it, this is exactly the usecase I need, something fast, perform based on my instruction.Cerebras is a winner here.
nusl
These graphs are really weird. One only shows 30-60% range with the model(s) close to 60%, the other shows 80% but the top model is at 77%.
tsss
Does anyone want this? Speed has never been the problem for me, in fact, higher latency means less work for me as a replaceable corporate employee. What I need is the most intelligence possible; I don't care if I have to wait a day for an answer if the answer is perfect. Small code edits, like they are presented as the use case here, I can do much better myself than trying to explain to some AI what exactly I want done.
cjbarber
For a bit, waiting for LLMs was like waiting for code to compile: https://xkcd.com/303/> more than 1000 tokens per secondPerhaps, no more?(Not to mention, if you're waiting for one LLM, sometimes it makes sense to multi-table. I think Boris from Anthropic says he runs 5 CC instances in his terminal and another 5-10 in his browser on CC web.)
deskithere
Anyway token eaters are upgrading their consumption capabilities.
allisdust
Normal codex it self is sub par compared to opus. This might be even worse
anon
undefined
cactusplant7374
I was really hoping it would support codex xhigh first.
jauntywundrkind
Wasn't aware there was an effort to move to websockets. Is there any standards work for this, or is this just happening purely within the walled OpenAI garden?> Under the hood, we streamlined how responses stream from client to server and back, rewrote key pieces of our inference stack, and reworked how sessions are initialized so that the first visible token appears sooner and Codex stays responsive as you iterate. Through the introduction of a persistent WebSocket connection and targeted optimizations inside of Responses API, we reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The WebSocket path is enabled for Codex-Spark by default and will become the default for all models soon.
behnamoh
In my opinion, they solved the wrong problem. The main issue I have with Codex is that the best model is insanely slow, except at nights and weekends when Silicon Valley goes to bed. I don't want a faster, smaller model (already have that with GLM and MiniMax). I want a faster, better model (at least as fast as Opus).When they partnered with Cerebras, I kind of had a gut feeling that they wouldn't be able to use their technology for larger models because Cerebras doesn't have a track record of serving models larger than GLM.It pains me that five days before my Codex subscription ends, I have to switch to Anthropic because despite getting less quota compared to Codex, at least I'll be able to use my quota _and_ stay in the flow.But even Codex's slowness aside, it's just not as good of an "agentic" model as Opus: here's what drove me crazy: https://x.com/OrganicGPT/status/2021462447341830582?s=20. The Codex model (gpt-5.3-xhigh) has no idea about how to call agents smh
cowpig
> Today, we’re releasingReleasing for real? Is it an open model?
rvz
> Today, we’re releasing a research preview of GPT‑5.3-Codex-Spark, a smaller version of GPT‑5.3-Codex, and our first model designed for real-time coding. Codex-Spark marks the first milestone in our partnership with Cerebras, which we announced in January .Nevermind. [0][0] https://news.ycombinator.com/item?id=35490837