Ai Engineering 3 min read

Arcee's Trinity-Large-Thinking model defies big tech in US

The Trinity-Large-Thinking model offers a low-cost, open-source alternative for OpenClaw users following Anthropic's recent subscription policy changes.

U.S. startup Arcee AI has released Trinity-Large-Thinking, a 400-billion-parameter open-source model designed specifically for complex agent workflows. The release coincides with Anthropic blocking approximately 135,000 users of the OpenClaw framework from its flat-rate Claude subscription tier. For developers relying on autonomous tool calling, this shifts the immediate economic viability of heavily automated systems.

Architecture and Training

Arcee built the model using a sparse Mixture-of-Experts (MoE) architecture to balance frontier-scale capacity with inference efficiency. The system contains 399 billion total parameters but only activates 13 billion parameters per token. It uses a 4-of-256 expert routing strategy. The 262,144-token context window can be expanded to 512,000 tokens in specific configurations.

Training required 33 days on 2,048 NVIDIA B300 Blackwell GPUs with a $20 million budget. The team utilized the Muon optimizer alongside a novel load balancing strategy called SMEBU (Soft-clamped Momentum Expert Bias Updates). This combination prevents expert collapse during training. The final weights are distributed under the Apache 2.0 license. If you deploy MoE architectures locally, this permissive license allows for fully air-gapped commercial hosting.

Agent Benchmarks and Performance

Trinity-Large-Thinking targets multi-turn tool calling and long-horizon planning rather than general chat. Its benchmark profile reflects this optimization. It closely trails top-tier closed models across major evaluations.

BenchmarkTrinity-Large-ThinkingClaude Opus 4.6
PinchBench91.993.3
IFBench52.353.1
SWE-bench Verified63.275.6

The model also scored 96.3 on AIME25, matching the top-tier Chinese model Kimi-K2.5. The gap in SWE-bench Verified indicates that while the model handles general tool orchestration well, it lags in complex software engineering tasks compared to Anthropic’s flagship model. If you are evaluating autonomous systems, you must account for this differential in pure coding capabilities.

The Anthropic Policy Shift and Migration

The rapid adoption curve for the new model is tightly coupled to policy changes at Anthropic. On April 3, 2026, Anthropic restricted OpenClaw traffic from its $200 per month flat-rate subscription tier. Users running continuous agent loops were forced onto standard API billing. Typical heavy users reported projected costs exceeding $600 per month under the new structure.

Following OpenClaw version 2026.4.7, which added native support for Arcee, Trinity-Large-Thinking became the most used open model in the U.S. on OpenRouter. The economics drive this migration entirely. Arcee serves the model at $0.90 per million output tokens. This represents a 96% reduction compared to Claude Opus 4.6, which costs $25 per million output tokens. If your application relies on complex multi-agent orchestration, this pricing fundamentally changes your allowable token budget per task.

Operating heavy autonomous loops on closed APIs exposes your infrastructure to sudden policy shifts and unpredictable billing spikes. Transitioning your agent tooling to Trinity-Large-Thinking secures a stable cost baseline without relying on foreign-built alternatives. Test your specific workflow against the model’s SWE-bench limitations before migrating production systems that rely heavily on zero-shot code generation.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading