Today OpenAI introduces GPT-5.1-Codex-Max, a new agent model for programming tasks inside Codex. It's designed for long, complex work: project-scale refactors, deep debugging and agent loops that run for hours.
What is GPT-5.1-Codex-Max
GPT-5.1-Codex-Max is an updated version of OpenAI's core reasoning model, trained specifically for agent tasks in software engineering, mathematics and research. Its two biggest improvements are the ability to work coherently across multiple context windows and greater token efficiency.
A feature called compaction lets it shrink and preserve essential context while freeing space to keep working. What’s the result? It can sustain tasks that used to fail because of context window limits — even sessions that last more than 24 hours in internal tests.
GPT-5.1-Codex-Maxcan complete complex refactors, iterate on failing tests and keep agent loops running for long periods without losing coherence.
What you can do with it
- Integrate it into the CLI, IDE extension, cloud and code review from Codex today. API access is coming soon.
- Run long-running jobs: large projects, deep debugging sessions and automation of repetitive tasks.
- Generate complex artifacts with fewer tokens, which translates into real cost savings. OpenAI reports that at
mediumreasoning effort the model achieves better results using 30% fewer thinking tokens than the previous version.
A concrete example shown in the announcement: asking it to generate a browser app that trains a policy for the CartPole environment, with an SVG visualizer of the network, metrics and saving to index.html. That shows it can produce self-contained projects with training and visualization built in.
Performance and metrics
OpenAI shares measurable improvements on frontier evaluations:
- SWE-Bench Verified: 73.7% (GPT-5.1-Codex) vs 77.9% (Codex-Max)
- SWE-Lancer IC SWE: 66.3% vs 79.9%
- TerminalBench 2.0: 52.8% vs 58.1%
There's also a new reasoning-effort level xhigh for latency-insensitive tasks, which thinks longer to produce better answers. For everyday use, they recommend medium as a balance between speed and quality.
Compatibility and workflow
GPT-5.1-Codex-Max was trained on real engineering tasks: creating pull requests, code reviews, frontend work and technical Q&A. It's the first model trained to operate in Windows environments and was optimized to collaborate with the Codex CLI experience.
Codex now uses GPT-5.1-Codex-Max by default across its surfaces, and OpenAI recommends using the Codex family for agent tasks and Codex-like environments instead of general-purpose models.
Security and best practices
Codex runs in a secure sandbox by default: file access is limited to the workspace and network is disabled unless the developer enables it. This reduces risks from prompt injection and unwanted access.
OpenAI maintains dedicated cybersecurity monitoring, detects and blocks suspicious activities, and prepares additional mitigations as capabilities evolve. Although GPT-5.1-Codex-Max is the most capable cybersecurity model deployed so far, under their internal framework it doesn't reach the High category — so extra safeguards are being strengthened.
Important: treat the agent as an extra reviewer, not a replacement for human judgment. Check terminal logs, tool calls and test results before applying changes to production.
Impact for developers and companies
OpenAI reports that internally 95% of their engineers use Codex weekly and that adoption increases pull request delivery pace by about 70% for those who use it. The better token efficiency and the ability to keep working over long periods make complex tasks more manageable and more economical.
If you're a developer, think of GPT-5.1-Codex-Max as a partner that holds long contexts, suggests global refactors and automates iterations — but one that needs human supervision before production deploys.
Practical reflection
Is it worth trying now? If you work on large projects, refactors or automations that today break due to context limits, yes. If your priority is privacy or you have flows that depend on a restricted network, keep the sandbox settings and audit the agent's outputs.
The arrival of GPT-5.1-Codex-Max is a clear step toward AI tools that can collaborate for hours on real problems. Will you give it a place in your workflow to speed up refactors and reviews, or use it only as an occasional assistant? Either way, remember to supervise and validate before you deploy.
