Arcee's Trinity-Large-Thinking model defies big tech in US
The Trinity-Large-Thinking model offers a low-cost, open-source alternative for OpenClaw users following Anthropic's recent subscription policy changes.
U.S. startup Arcee AI has released Trinity-Large-Thinking, a 400-billion-parameter open-source model designed specifically for complex agent workflows. The release coincides with Anthropic blocking approximately 135,000 users of the OpenClaw framework from its flat-rate Claude subscription tier. For developers relying on autonomous tool calling, this shifts the immediate economic viability of heavily automated systems.
Architecture and Training
Arcee built the model using a sparse Mixture-of-Experts (MoE) architecture to balance frontier-scale capacity with inference efficiency. The system contains 399 billion total parameters but only activates 13 billion parameters per token. It uses a 4-of-256 expert routing strategy. The 262,144-token context window can be expanded to 512,000 tokens in specific configurations.
Training required 33 days on 2,048 NVIDIA B300 Blackwell GPUs with a $20 million budget. The team utilized the Muon optimizer alongside a novel load balancing strategy called SMEBU (Soft-clamped Momentum Expert Bias Updates). This combination prevents expert collapse during training. The final weights are distributed under the Apache 2.0 license. If you deploy MoE architectures locally, this permissive license allows for fully air-gapped commercial hosting.
Agent Benchmarks and Performance
Trinity-Large-Thinking targets multi-turn tool calling and long-horizon planning rather than general chat. Its benchmark profile reflects this optimization. It closely trails top-tier closed models across major evaluations.
| Benchmark | Trinity-Large-Thinking | Claude Opus 4.6 |
|---|---|---|
| PinchBench | 91.9 | 93.3 |
| IFBench | 52.3 | 53.1 |
| SWE-bench Verified | 63.2 | 75.6 |
The model also scored 96.3 on AIME25, matching the top-tier Chinese model Kimi-K2.5. The gap in SWE-bench Verified indicates that while the model handles general tool orchestration well, it lags in complex software engineering tasks compared to Anthropic’s flagship model. If you are evaluating autonomous systems, you must account for this differential in pure coding capabilities.
The Anthropic Policy Shift and Migration
The rapid adoption curve for the new model is tightly coupled to policy changes at Anthropic. On April 3, 2026, Anthropic restricted OpenClaw traffic from its $200 per month flat-rate subscription tier. Users running continuous agent loops were forced onto standard API billing. Typical heavy users reported projected costs exceeding $600 per month under the new structure.
Following OpenClaw version 2026.4.7, which added native support for Arcee, Trinity-Large-Thinking became the most used open model in the U.S. on OpenRouter. The economics drive this migration entirely. Arcee serves the model at $0.90 per million output tokens. This represents a 96% reduction compared to Claude Opus 4.6, which costs $25 per million output tokens. If your application relies on complex multi-agent orchestration, this pricing fundamentally changes your allowable token budget per task.
Operating heavy autonomous loops on closed APIs exposes your infrastructure to sudden policy shifts and unpredictable billing spikes. Transitioning your agent tooling to Trinity-Large-Thinking secures a stable cost baseline without relying on foreign-built alternatives. Test your specific workflow against the model’s SWE-bench limitations before migrating production systems that rely heavily on zero-shot code generation.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How Cursor Built Composer 2 on Top of Kimi K2.5
Cursor's Composer 2 is built on Kimi K2.5. What continued pretraining, reinforcement learning, and self-summarization mean, and how they work.
Safetensors Becomes the New PyTorch Model Standard
Hugging Face's Safetensors library joins the PyTorch Foundation to provide a secure, vendor-neutral alternative to vulnerable pickle-based model serialization.
NVIDIA Nemotron 3 Super Redefines Agentic AI with Hybrid MoE
NVIDIA's new Nemotron 3 Super combines Mamba and Transformer architectures with a 1-million token context window to power high-speed autonomous reasoning.
TurboQuant Cuts LLM Memory Use by 6x Without Quality Loss
Google Research unveils TurboQuant, a compression suite delivering 8x faster inference and massive VRAM savings for long-context models like Llama-3.1.
Google Research Finds Huge Gap in LLM Behavioral Alignment
A new Google study reveals that frontier LLMs often fail to reflect human social tendencies, showing extreme overconfidence in low-consensus scenarios.