Ai Coding 3 min read

Cohere Ships 30B MoE North-Mini-Code for Local Coding Agents

Cohere Labs has released North-Mini-Code-1.0, an Apache 2.0 licensed 30 billion parameter mixture-of-experts model optimized for local coding workflows.

On June 9, 2026, Cohere Labs released North-Mini-Code-1.0, an open-source model specialized for local software engineering tasks. The release marks the beginning of the North model family, targeting enterprise environments that require sovereign AI capabilities. For developers building tools for code generation or automated testing, the model offers a high-speed alternative designed specifically for consumer hardware.

Architecture and Hardware Requirements

North-Mini-Code-1.0 utilizes a sparse Mixture-of-Experts (MoE) Transformer decoder design. The model contains 30 billion total parameters but activates only 3 billion parameters per token. This configuration enables efficient local inference on machines with limited memory.

The architecture heavily customizes its attention layers. It interleaves sliding-window self-attention with global self-attention at a 3:1 ratio. The sliding-window layers use Rotary Positional Embeddings, while the global layers operate without positional embeddings. The model features 128 total experts, activating 8 per token.

Developers can run the model locally using roughly 20GB of RAM via MLX on a Mac Studio or on a single NVIDIA H100. It supports a 256,000-token input context and a 64,000-token output limit.

Benchmark Performance

Cohere optimized North-Mini-Code-1.0 specifically for programming tasks rather than general reasoning. Independent audits by Artificial Analysis show the model scoring 33.4 on the Coding Index. This result places it ahead of Qwen3.5 (35B-A3B) and Gemma 4 (26B-A4B).

Performance drops significantly on broad logic tasks. The model scored 14% on GDPval-AA and 37% on $\tau^2$-Bench Telecom. This divergence confirms the model trades general knowledge for specialized programming capabilities.

Benchmark IndexScorePerformance Context
Artificial Analysis Coding33.4Outperforms Qwen3.5 35B-A3B
Artificial Analysis Intelligence27.6Outperforms gpt-oss-20B
$\tau^2$-Bench Telecom37%Indicates narrow task optimization
GDPval-AA14%Weak general reasoning

During execution, the sparse architecture achieves inference speeds of 199 to 210 output tokens per second.

Integration and Availability

The model is licensed under Apache 2.0 and available as open weights on Hugging Face. Developers can also access it through the Cohere Chat V2 API.

Cohere pre-trained the model with native tool-use capabilities for terminal-based agentic workflows. It integrates directly with deployment platforms like OpenCode and Model Vault. This setup targets enterprise teams managing sensitive codebases that cannot be routed through external cloud APIs.

If you manage air-gapped development environments, North-Mini-Code-1.0 provides a fast on-device coding assistant. You can deploy it using standard consumer hardware without compromising your internal security perimeter.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading