Our Take
A genuinely efficient coding agent baseline that trades size for post-training discipline (multi-stage SFT, verifiable rewards, cross-harness robustness), but the benchmarks are vendor-reported and no independent reproducer has published results yet.
Why it matters
Teams building code agents can now start with a 30B sparse model instead of 120B dense ones, cutting inference cost and latency. The architectural choices (interleaved attention, token-level RL aggregation, async sampling) are worth studying regardless of which model you deploy.
Do this week
Benchmark North Mini Code against your internal coding tasks and agent harnesses before committing to larger models; run it through your own verifier to confirm the 33.4 score holds in your environment.
Cohere releases North Mini Code: 30B sparse MoE with 3B active parameters
Cohere has released North Mini Code as the first model in its new family designed specifically for agentic software engineering tasks. The model is a decoder-only Transformer-based sparse Mixture-of-Experts architecture with 128 experts, of which 8 are activated per token. It uses interleaved sliding-window attention and global attention in a 3:1 ratio, with SwiGLU feed-forward blocks and efficient attention implementation.
On Artificial Analysis' Coding Index, North Mini Code scores 33.4, outperforming Qwen 3.5 (35B-A3B), Gemma 4 (26B-A4B), and substantially larger models including Nemotron 3 Super (120B-A12B), Mistral Small 4 (119B-A6B), and Devstral 2 (123B) (per Cohere's published benchmarks). The model achieves 80.2% pass@10 on SWE-Bench Verified and 55.1% pass@10 on Terminal-Bench v2 after supervised fine-tuning, improving to 61.0% pass@1 on mini-SWE-Agent harness after reinforcement learning with verifiable rewards (RLVR).
Post-training discipline, not scale, drives the gains
The model's efficiency comes from a two-stage cascaded supervised fine-tuning pipeline followed by RLVR. The first SFT stage uses 64K context and a mixed dataset where code forms 70% of trainable tokens (43% agentic tool-use data, 27% competitive or scientific programming). The second stage trains on 128K context using only 4.5 billion tokens of high-quality agentic and reasoning-driven samples, where code forms 61% of trainable tokens and all tool calls and completions are verified as executable.
Rather than optimize for quantitative metrics during SFT, Cohere treated it as priming for RLVR, relying on sample-level filtering to remove invalid tool calls, malformed tokens, and hallucinated citations. Over 70,000 verifiable tasks across approximately 5,000 unique repositories were used, with deduplication against SWE-Bench and SWE-Bench-Pro to avoid source leakage.
The RLVR stage used an asynchronous training loop (a vLLM sidecar fed rollouts continuously to an offline learner) to handle variable-length code traces. Weights were exported every four learner steps, and the model trained on both terminal-based and software engineering tasks simultaneously using binary rewards derived from unit-test-based verifiers. RLVR improved pass@1 performance by 7.9 percentage points on Terminal-Bench v2 and 3.0 percentage points on SWE-Bench.
Cross-harness generalization and inference cost matter more than benchmark scores
Cohere trained North Mini Code on multiple agent harnesses (SWE-Agent, mini-SWE-agent, OpenCode, and Terminal-Bench's Terminus 2) rather than optimizing for a single one. Adding just 6% benchmark harness data during the second SFT stage yielded a 10% gain when evaluated with OpenCode while maintaining SWE-Bench Verified performance. This matters because real agents encounter diverse tooling environments with different CLI interfaces, structured JSON responses, and raw stdout formats.
The sparse activation pattern (8 of 128 experts per token) significantly reduces inference cost and memory footprint compared to 30B dense models. Practitioners building code agents should test the model against their own tool harnesses and internal verification pipelines before assuming the published benchmarks translate to their environment. The model is available under Apache 2.0 on Hugging Face.