OpenAI introduces GPT-5.3-Codex-Spark, a smaller, ultra-fast version of Codex built for coding in real time. Can you imagine seeing changes in your code almost instantly while you edit in VS Code or the terminal? That’s exactly what Codex-Spark aims for: an interactive experience where latency matters as much as intelligence.
What is Codex-Spark
Codex-Spark is a model optimized for fast inference and interactive work. It’s a smaller variant of GPT-5.3-Codex designed for near-instant responses, able to generate more than 1000 tokens per second and with a context window of 128k tokens. For now it’s text-only, focused on directed editing, logic refinement, and quick changes in interfaces.
Who is it useful for? For developers who iterate in the moment: quick tests, guided refactoring, and real-time collaboration with an assistant that responds without long pauses.
Why it matters now
AI isn’t just for long background tasks anymore. What if you could interrupt, redirect, and iterate with the model in the same session, without waiting? Codex-Spark opens that path, combining two modes: long-term high-level work and instant collaboration for when you need results right now.
OpenAI ships it as a research preview to ChatGPT Pro users and a small group of API partners while they scale capacity in datacenters and polish the user experience.
How they achieved that speed
There are two key pieces: hardware and optimizations across the pipeline.
- Hardware: Codex-Spark runs on Cerebras’ Wafer Scale Engine 3, an accelerator designed for low-latency inference. OpenAI integrates it into their production stack to provide a latency-first service route.
- Software and pipeline: they introduced a persistent WebSocket connection by default and rewrote parts of the inference stack. That cut client/server round-trip overhead by 80%, token overhead by 30%, and time-to-first-token by 50%.
Also, thanks to these improvements, the model feels snappier when you iterate: it doesn’t run tests automatically and makes minimal, precise edits unless you ask otherwise.
Performance and comparisons
On software-engineering-oriented benchmarks like SWE-Bench Pro and Terminal-Bench 2.0, GPT-5.3-Codex-Spark showed solid agentic capability while completing tasks much faster than GPT-5.3-Codex. In practice, that means delivering fixes and useful changes in a fraction of the time.
Limits, access and safety
- Access: initial rollout for ChatGPT Pro in the Codex apps, CLI and the VS Code extension. There are also limited API accesses for design partners.
- Usage limits: during the research preview it has separate rate limits; usage doesn’t count against standard quotas and there may be queues or limited access if demand rises.
- Safety: Codex-Spark received the same safety training as the main models, including risk-focused training for cybersecurity. OpenAI evaluated that it doesn’t reach high-capability thresholds in cybersecurity or biology according to their preparedness framework.
What you can expect and what’s next
Codex-Spark is the first member of a family of ultra-fast models. As the developer community uses it, we’ll see tweaks and new capabilities: larger models, even longer contexts and multimodal input. There’s also a clear vision: combining GPUs and Cerebras in workflows to balance cost, scale and latency.
If you work on products that need immediate interaction—editors, devops tools, IDE assistants—this changes the rhythm of feedback. Imagine requesting a refactor and watching the code adjust while you keep typing; or iterating on an interface and getting small corrections instantly.
In the end, the question isn’t just whether AI is capable, but how fast and natural it feels when you use it. Codex-Spark pushes that future forward: it doesn’t replace long-running models, it complements them, and above all it reduces the friction between the idea and working code.
