Open Coding Agents from Ai2 introduces SERA, a family of open-source agents designed so any team or researcher can train and adapt an agent to private code without the massive infrastructure that used to be necessary.
Can you imagine adapting an agent to your repo in hours and for a few hundred dollars? That's the promise here: models, data and reproducible recipes that cut cost and complexity for tasks like code generation, review, debugging and maintenance.
What SERA brings (technical summary)
SERA (Soft-verified Efficient Repository Agents) arrives with several practical improvements for developers and researchers.
Open models from 8B to 32B based on Qwen3, trained up to 32K token contexts.
Reproducible methodology: the whole pipeline is SFT (supervised fine-tuning) on agentic trajectories, with no need for large RL infrastructures.
Dramatically reduced costs: reproducing the best prior open-source result costs roughly $400 on commercial GPUs; reaching performance competitive with industry models can cost up to $12,000.
Key innovations (how it works and why it’s cheap)
Ai2 proposes two technical ideas that make economical adaptation to private repos possible:
Soft-verified generation (SVG)
Synthetic generation normally demands fully correct, tested patches. SVG relaxes that requirement: patches can be partially correct and still useful for training agents. Why does this work? Because what teaches the agent isn't only absolute correctness of code, but patterns of transformation and reasoning in the workflow. That removes the need for expensive exhaustive testing infrastructure.
Bug-type menu and workflow fidelity
They use a taxonomy of 51 common bug patterns to diversify synthetic examples. For each function in a repo you can generate multiple trajectories with different bug styles. In addition, they prioritize that synthetic data reproduces the way a developer works (comments, reviews, iterations), not just the final result. That improves transfer to real repositories.
Performance and efficiency (numbers that matter)
The SERA-32B variant stands out: it achieves 54.2% on SWE-Bench Verified at 64K context, beating prior open-source models of similar size.
Training: ~40 GPU-days on a cluster with 2 NVIDIA Hopper or RTX PRO 6000 Blackwell Server Edition GPUs.
Cost comparisons: 57x cheaper than the SWE-smith technique and 26x cheaper than SkyRL on internal metrics.
Inference optimizations with NVIDIA:
BF16 on 4xH100: ~1,950 output tokens per second (peak) with 16K context.
FP8: ~3,700 tokens/s with almost negligible precision loss.
Blackwell 4xB200 in NVFP4: ~8,600 tokens/s peak.
These numbers make SERA useful in production even for demanding deployments.
Specializing to private code: tests on Django and SymPy
The most interesting part for small teams: a SERA-32B finetuned with only 8,000 synthetic trajectories per repository (approximate cost $1,300) can match or beat its 100B+ 'teacher' on repos like Django and SymPy. That means a smaller, cheaper, lower-latency model can replace a large generalist model for a specific domain.
In numbers: specializing at 32K context, SERA-32B reached 52.23% on Django and 51.11% on SymPy, versus 51.20% and 48.89% for GLM-4.5-Air respectively.
How you'll use it (practical)
The release includes everything you need: models, synthetic data, training recipes, a CLI and inference optimizations. Deployment is lightweight: Ai2 says you can launch an inference server with two lines of code and the CLI is available on PyPI.
Are you an indie developer or a small company? You can:
Generate synthetic data from your repo using the bug-type menu.
Run an SFT on an open model (e.g. SERA-8B or SERA-32B) on commodity hardware.
Validate locally and deploy with BF16/FP8 optimizations if you have NVIDIA GPUs.
Limitations and technical recommendations
SERA is SFT-first: it doesn't use RL in its main recipe, so some advanced agentic behaviors might need extra steps.
The best teachers (like GLM-4.6) help in high-compute regimes, but a cheaper teacher can be the best choice in early iteration stages.
Even though SVG reduces the need for full testing, it's good practice to evaluate adaptations on representative test sets before deploying automated changes to production.
Why this matters
SERA lowers the barrier for small teams and labs to explore code agents tuned to private repositories. Instead of investing in massive infrastructure, you can now reproduce and adapt agents with reasonable costs and timelines, while keeping the science behind these systems reproducible.
Want to experiment? With models, data and scripts opening the box, the community can improve, audit and specialize agents in real domains without relying only on closed models.