Ai Engineering 5 min read

How to Implement the Advisor Strategy with Claude

Optimize AI agents by pairing high-intelligence advisor models with cost-effective executors using Anthropic's native advisor tool API.

Anthropic’s new advisor strategy provides a native architectural pattern to optimize the balance between high-level reasoning and operational cost for AI agents. By pairing a frontier model like Claude Opus with an efficient model like Claude Sonnet or Haiku in a single API request, you can achieve near-frontier intelligence at a fraction of the cost. This setup reduces the manual overhead of routing prompts between different models. This tutorial covers how to configure the native advisor tool, manage token billing, and assess when this architectural pattern fits your workload.

Inverting the Orchestrator Pattern

The advisor strategy fundamentally alters how multi-agent systems process tasks. Traditionally, an orchestrator pattern uses a massive model to manage state and delegate specific sub-tasks to smaller worker models. The advisor strategy inverts this relationship completely.

Under the new paradigm, the smaller, more cost-effective model acts as the executor. The executor handles the end-to-end task loop. It calls tools, reads the results, and iterates continuously without intervention.

The frontier model acts strictly as the advisor. It remains dormant during standard operations. The executor only escalates to the advisor when it encounters a complex decision or reaches a specific bottleneck in the workflow. The advisor never produces user-facing output. It does not call tools directly. It simply evaluates the executor’s state and provides a structured plan, a directional correction, or a stop signal before handing control back to the execution loop.

Developers view this approach as a native implementation of routing patterns previously handled by third-party frameworks. It provides a reliable middle ground for enterprise users facing high API bills while attempting to maintain autonomous decision-making within the execution loop.

Technical Implementation and API Setup

Integrating this pattern relies on a native server-side tool available directly within the standard /v1/messages endpoint. Anthropic provides the advisor tool to manage the handoff without requiring manual context management or extra network round-trips.

To access this feature, you must pass a specific beta header in your API request. Set the header to anthropic-beta: advisor-tool-2026-03-01 to enable the routing logic. You then specify the exact tool configuration using the advisor_20260301 parameter. The official advisor strategy documentation covers the precise JSON schema required for defining your tools within this nested structure.

Because the handoff occurs server-side within a single request, the system handles context synchronization automatically. The executor maintains its running context, and the advisor reads that state upon invocation.

Cost Management and Usage Controls

The primary benefit of this routing approach is predictable cost reduction. Tokens are billed strictly at the respective rates for each model. Advisor tokens run at Claude Opus rates, while executor tokens run at Claude Sonnet or Claude Haiku rates.

You control the frequency of escalation by setting a max_uses parameter in your request. This parameter establishes a hard limit on the number of times the executor can invoke the advisor within a single run. This prevents runaway loops where a confused executor repeatedly pings the expensive frontier model.

A typical advisor intervention generates a plan consisting of 400 to 700 tokens. To help you reduce API costs, the standard API response includes an isolated usage block. This block tracks advisor tokens separately from executor tokens. This distinct usage tracking makes it straightforward to monitor AI applications and calculate the exact financial impact of your routing configuration.

Performance and Evaluation Benchmarks

Anthropic’s evaluation data demonstrates the strategy’s impact on benchmark performance and cost efficiency across different model pairings.

ConfigurationBenchmarkPerformanceCost Impact
Sonnet + Opus AdvisorSWE-bench Multilingual+2.7 percentage points vs. Sonnet solo-11.9% cost per task
Haiku + Opus AdvisorBrowseComp41.2% (vs. 19.7% for Haiku solo)85% cheaper than Sonnet solo

The Haiku configuration shows the most dramatic shift. By relying on Haiku for standard web navigation tasks and bringing in Opus only for complex reasoning, the system more than doubles the baseline performance of Haiku alone. The resulting architecture costs 85% less than running the task entirely on Claude Sonnet.

The Sonnet configuration provides a smaller percentage boost in raw performance but still manages to outperform a pure Sonnet implementation while dropping the cost per task by nearly 12%.

Production Context and Limitations

The release of the advisor strategy coincides with several major shifts in the Claude Platform ecosystem. Anthropic confirmed the general availability of 1 million token context windows for Claude Opus 4.6 and Sonnet 4.6 earlier in April 2026. This expanded context makes the advisor pattern highly viable for long-running codebase operations that previously required aggressive context pruning. Older models, specifically Sonnet 3.7 and Haiku 3.5, have been retired.

You must balance these new capabilities against specific system constraints. The advisor model cannot execute functions directly. Your executor model must still possess sufficient formatting capability to accurately write tool calls based on the advisor’s text-based guidance. If the executor struggles with strict JSON formatting, the high-level plan from the advisor will not prevent a tool-call failure.

Anthropic also released a public beta for Claude Managed Agents. This fully managed harness features secure sandboxing and server-sent event (SSE) streaming. Teams must decide whether to build a custom advisor loop via the /v1/messages API or adopt the heavier managed infrastructure. Furthermore, highly secure environments may look toward the new Claude Mythos preview. Launched for the invitation-only Project Glasswing, this high-tier defensive cybersecurity model operates on a separate performance tier entirely.

Review your current agent execution logs to identify tasks where a smaller model frequently fails or enters infinite tool-calling loops. Configure the advisor tool with a strict max_uses limit on those specific endpoints to measure the immediate impact on task completion rates.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading