Apache 2.0 Gets 218B Command A+ as Cohere Acquires Reliant AI
Cohere expanded its sovereign AI strategy by open-sourcing the 218-billion parameter Command A+ model and acquiring biopharma startup Reliant AI.
On May 20, 2026, Cohere expanded its enterprise infrastructure stack with the release of Command A+ and the acquisition of biopharma specialist Reliant AI. This represents the company’s first model released under the permissive Apache 2.0 license, providing full weight access for local and air-gapped deployments. The dual announcements advance a sovereign AI strategy targeted at regulated industries and government infrastructure.
Command A+ Architecture and Efficiency
Command A+ is a decoder-only Sparse Mixture-of-Experts (MoE) Transformer. It contains 218 billion total parameters, with 25 billion active during generation. The architecture distributes knowledge across 128 total experts, activating 8 per token alongside a single shared expert applied to all tokens.
| Metric | Specification |
|---|---|
| Total Parameters | 218 Billion |
| Active Parameters | 25 Billion per token |
| Experts | 128 (8 active per token + 1 shared) |
| Context Window | 128K input, 64K output |
| Quantization | W4A4 (4-bit) |
| Throughput | 375 tokens per second |
| Time-to-First-Token | 113 milliseconds |
The model relies on W4A4 quantization to fit inference onto a single NVIDIA Blackwell B200 or dual NVIDIA H100 GPUs. If you manage AI inference deployments, the 4-bit optimization makes it viable to run a 200B+ class model on standard enterprise nodes without specialized cluster configurations.
Agentic Capabilities and Multilingual Support
The context window processes 128K input tokens and generates up to 64K output tokens. It natively handles both text and image inputs, specifically targeting the extraction of data from visual charts and technical manuals.
Command A+ is tuned for tool use and Retrieval-Augmented Generation tasks with verifiable citations. Benchmark results show a sharp shift in tool-calling precision. The model scored 85% on the $\tau^2$-Bench Telecom evaluation, up from 37% in previous iterations. It also achieved a 90% on the AIME 25 benchmark, an increase from 57%.
The tokenizer now supports 48 languages natively, more than doubling previous capabilities. This includes all official European Union languages, alongside Japanese, Arabic, and Hindi.
Reliant AI Acquisition and Pharma Integration
Cohere acquired Reliant AI to verticalize its enterprise offerings for the biopharma sector. Founded in 2023 by former DeepMind and Google Brain researchers, the startup brings 30 staff members into the organization. Karl Moritz Hermann joins as VP of AI Verticalizations in Berlin, while Marc Bellemare takes the VP of Modeling role in Montreal.
The acquisition bundles Reliant Tabular into a new suite called North for Pharma. This product automates systematic literature reviews, competitive landscaping, and regulatory data extraction. Existing clients like GSK, Kyowa Kirin, and Medicus Pharma will transition directly to Cohere infrastructure.
European Sovereign AI Infrastructure
Alongside the technical releases, Cohere signed a Memorandum of Understanding with the Indra Group. Backed by the Canadian and Spanish governments, the partnership integrates Cohere models into the IndraMind initiative. The collaboration focuses on building private, air-gapped AI environments for critical assets in Europe. If you work on multi-agent systems for defense or utilities, this provides a framework for deploying high-capability models entirely offline.
For developers in regulated environments, the transition of Command A+ to an Apache 2.0 license removes the primary legal barrier to full local control. You can now build multimodal RAG pipelines and autonomous tools on hardware you own, without routing sensitive data through external APIs.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
What Is an LLM? How Large Language Models Actually Work
LLMs predict text, they don't understand it. Here's how large language models work under the hood, from training to transformers to next-token prediction, and why it matters for how you use them.
Gemma 4 Arrives With Full Apache 2.0 License
Google releases Gemma 4, a new generation of open models optimized for advanced reasoning, agentic workflows, and high-performance edge deployment.
8K Context Reranking Hits Hugging Face With Ettin Cross-Encoders
Hugging Face released six open-source cross-encoders under the Ettin Reranker family with an 8,192-token context window for long-form document retrieval.
OlmoEarth v1.1 Tops DINOv3 in Remote Sensing Benchmarks
Ai2 updated its multimodal Earth observation models with OlmoEarth v1.1, bringing enhanced training efficiency and state-of-the-art benchmark performance.
PaddleOCR 3.5 Adds Transformers Backend and Browser Inference
The PaddleOCR 3.5 update decouples the toolkit from the PaddlePaddle framework by adding a native Transformers backend and client-side browser execution.