Cohere Ships 30B MoE North-Mini-Code for Local Coding Agents

On June 9, 2026, Cohere Labs released North-Mini-Code-1.0, an open-source model specialized for local software engineering tasks. The release marks the beginning of the North model family, targeting enterprise environments that require sovereign AI capabilities. For developers building tools for code generation or automated testing, the model offers a high-speed alternative designed specifically for consumer hardware.

Architecture and Hardware Requirements

North-Mini-Code-1.0 utilizes a sparse Mixture-of-Experts (MoE) Transformer decoder design. The model contains 30 billion total parameters but activates only 3 billion parameters per token. This configuration enables efficient local inference on machines with limited memory.

The architecture heavily customizes its attention layers. It interleaves sliding-window self-attention with global self-attention at a 3:1 ratio. The sliding-window layers use Rotary Positional Embeddings, while the global layers operate without positional embeddings. The model features 128 total experts, activating 8 per token.

Developers can run the model locally using roughly 20GB of RAM via MLX on a Mac Studio or on a single NVIDIA H100. It supports a 256,000-token input context and a 64,000-token output limit.

Benchmark Performance

Cohere optimized North-Mini-Code-1.0 specifically for programming tasks rather than general reasoning. Independent audits by Artificial Analysis show the model scoring 33.4 on the Coding Index. This result places it ahead of Qwen3.5 (35B-A3B) and Gemma 4 (26B-A4B).

Performance drops significantly on broad logic tasks. The model scored 14% on GDPval-AA and 37% on $\tau^2$-Bench Telecom. This divergence confirms the model trades general knowledge for specialized programming capabilities.

Benchmark Index	Score	Performance Context
Artificial Analysis Coding	33.4	Outperforms Qwen3.5 35B-A3B
Artificial Analysis Intelligence	27.6	Outperforms gpt-oss-20B
$\tau^2$-Bench Telecom	37%	Indicates narrow task optimization
GDPval-AA	14%	Weak general reasoning

During execution, the sparse architecture achieves inference speeds of 199 to 210 output tokens per second.

Integration and Availability

The model is licensed under Apache 2.0 and available as open weights on Hugging Face. Developers can also access it through the Cohere Chat V2 API.

Cohere pre-trained the model with native tool-use capabilities for terminal-based agentic workflows. It integrates directly with deployment platforms like OpenCode and Model Vault. This setup targets enterprise teams managing sensitive codebases that cannot be routed through external cloud APIs.

If you manage air-gapped development environments, North-Mini-Code-1.0 provides a fast on-device coding assistant. You can deploy it using standard consumer hardware without compromising your internal security perimeter.

Cohere Ships 30B MoE North-Mini-Code for Local Coding Agents

Architecture and Hardware Requirements

Benchmark Performance

Integration and Availability

Keep Reading

How to Build Programmatic Agents With the Cursor SDK

12B JetBrains Mellum2 MoE Fits Agent Routines on One H100

Osaurus Pivots to Unified macOS Agent Platform With Linux VMs

GLM-5.1 MoE Beats GPT-5.4 in Open-Source Engineering Milestone

Model-Agnostic Cloud Runtime for Coding Agents Secures $7M Seed