Cohere launches North Mini Code: 30B MoE model for agents | Keryc
Cohere introduces North Mini Code, a model built for developers acting as agents in terminal and software-engineering environments. What makes it different? It’s not just size: it’s a mix of sparse architecture, training aimed at agentic tasks, and an RL pipeline with verifiable rewards.
What is North Mini Code
North Mini Code is a Mixture-of-Experts (MoE) model with 30B total parameters but about ~3B active parameters per token. Cohere released it under the Apache 2.0 license and published it on Hugging Face, plus integrated it into OpenCode and their API.
It’s the first member of the North family and is designed specifically for agentic software engineering tasks: running commands in a terminal, using typed tools, editing repositories, and generating complex code with long context.
Architecture and technical design
It’s a Transformer decoder, sparse MoE, with FFN blocks implemented as experts (128 experts total, 8 activated per token).
Interleaved attention: mixes sliding-window with RoPE and global attention without positional embeddings in a 3:1 ratio. Cohere uses an efficient attention implementation to scale long contexts.
Each expert uses an FFN with SwiGLU activation. The router applies a sigmoid to logits before selecting the top-k. There’s also a dense layer before the sparse layers.
Practical result? Less compute activated per token and greater capacity to keep distinct behaviors depending on context and tools.
Training: two SFTs and RLVR
The post-training is cascaded: two phases of Supervised Fine-Tuning (SFT) followed by Reinforcement Learning with Verifiable Rewards (RLVR).
First SFT: broad mix (70% code tokens in the trainable mix) with programming, reasoning and instruction data. Contexts of 64K tokens.
Second SFT: 4.5B tokens focused on agentic and reasoning samples (61% code). Contexts of 128K tokens — a "long-to-longer" strategy to consolidate skills on long traces.
RLVR: online multi-environment training (Terminal and SWE) with binary rewards derived from verifiable unit tests. They use CISPO as the objective (importance-weighted correction at token level, different from PPO), and token-level aggregation to keep signal across long trajectories.
They also use a practical RL design: decoupled sampling with a vLLM sidecar that generates rollouts while the trainer learns, and a FIFO window queue to prevent long rollouts from blocking training.
Robustness in harnesses and generalization
A key idea: train with several "harnesses" (agent interfaces) instead of optimizing for only one. They included data from SWE-Agent, mini-SWE-Agent, OpenCode and Terminus-2, with small portions of each format to force generalization.
What’s the gain? Cheap cross-transfer: adding 6% of alternate-harness data gave +10% on OpenCode without degrading SWE-Agent. On mini-SWE-Agent they reach 61.0% pass@1.
They also apply sample-level filtering to remove invalid tool calls, malformed tokens and other pathogens that cause bad behaviors in RL.
Metrics and benchmarks (what actually matters)
Artificial Analysis Coding Index: North Mini Code scores 33.4, outperforming similarly sized models and several much larger models on agentic tasks and complex code generation.
Final SFT: 80.2% pass@10 on SWE-Bench Verified and 55.1% pass@10 on Terminal-Bench v2.
Improvement after RLVR: +7.9 absolute points in pass@1 on Terminal-Bench v2 and +3.0 points on SWE-Bench compared to the SFT checkpoint.
Human evaluation (pairwise): RLVR helps especially in code editing; the final version wins 66.1% of the time against the SFT checkpoint on evaluated samples.
They also report less repetitive trajectories, fewer invalid calls and shorter rollouts — in practice, the agent solves tasks faster and with fewer useless steps.
Training and practical resources
They used over 70k verifiable tasks drawn from ~5k repositories, with deduplication against SWE-Bench and SWE-Bench-Pro to avoid data leakage.
Global contexts of 128K tokens in RL; RL batches with 512 rollouts and group size 8.
Weights available in BF16 and FP8 (quantized) on Hugging Face for practical use.
What this means for you as a developer or product manager
If you build agents that handle terminals, CI pipelines or repository automation, North Mini Code offers a base geared to real tasks: long context windows, cross-harness robustness and training explicitly for interaction with verifiable tools.
Will you replace a senior dev with this tomorrow? No. Can you speed up bug fixes, test generation and editing assistants inside real workflows? Yes — and with fewer harness adjustments than you might expect.
Availability
North Mini Code is available in OpenCode, the Cohere API and on Hugging Face with weights in bf16 and fp8.