Ai Engineering 4 min read

Anthropic Makes Claude's 1M Token Context Generally Available

Anthropic made 1M-token context GA for Claude 4.6, removing long-context premiums and boosting throughput for large code and agent tasks.

Anthropic made 1M-token context generally available for Claude Opus 4.6 and Claude Sonnet 4.6 on March 13, 2026. The practical change is commercial and operational: standard pricing now applies across the full 1M window, the old beta header is gone, standard rate limits apply, and per-request media limits rose from 100 to 600 images or PDF pages. For developers building coding agents, long-running sessions, and large-document workflows, this makes Claude’s 1M context materially easier to use in production.

GA Changes

Anthropic’s announcement turns a beta capability introduced in February into a general-availability feature on the Claude Platform, with rollout also stated for Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry. The key change is the removal of the long-context premium. A 900K-token request is billed at the same per-token rate as a 9K-token request. Current published pricing: Claude Opus 4.6 at $5 / MTok input and $25 / MTok output, Claude Sonnet 4.6 at $3 / MTok input and $15 / MTok output. For applications that regularly cross 200K tokens, marginal cost drops sharply. A 900K-token input on Sonnet 4.6 now stays at $3 / MTok instead of the previous premium rate.

If you build AI agents that carry forward long traces, code history, tool outputs, and planning state, this directly changes your cost envelope.

Impact on Coding Workflows

Anthropic tied the rollout to Claude Code. 1M context is now included in Claude Code for Max, Team, and Enterprise users with Opus 4.6, and sessions can use the full window automatically. Coding sessions accumulate context differently from standard chat. A serious agent loop can include repository maps, diffs, test failures, tool outputs, stack traces, prior plans, and long conversational state. Once the window fills, systems typically compact, summarize, or drop older content.

Anthropic says customers saw a 15% decrease in compaction events. A larger affordable window means less forced summarization and fewer opportunities to lose details that matter for debugging, code review, or multi-step task execution. Fewer compactions usually mean more stable agent memory across long sessions, less summarization overhead, and better preservation of exact text such as API signatures, error logs, and diff context.

Throughput and Media Limits

Customers now get standard account throughput across the entire 1M window. A long context window that only works under reduced throughput is harder to schedule in multi-tenant systems. Standard throughput at 1M means long-context requests become easier to treat as normal traffic, subject to your account’s usual limits rather than a separate slower tier.

The media limit increase from 100 to 600 images or PDF pages per request affects large legal or compliance review, research workflows over image-rich documents, multimodal RAG where you pass retrieved pages directly, and incident analysis that combines screenshots, dashboards, and logs.

Quality at Long Range

Anthropic says Opus 4.6 scores 78.3% on MRCR v2 at 1M-token context, which it presents as a frontier-leading result. Raw context size does not guarantee usable retrieval performance. A 1M-token window helps only if the model can still find and use relevant information deep inside that prompt. If you work with context windows, this is the core engineering question.

The March 13 rollout does not eliminate the need for retrieval, chunking, or prompt design. It changes where the cost and complexity breakpoints sit. For many systems, larger working sets in prompt, longer raw-session continuity, and larger single-pass analysis become more practical.

If you build coding agents, document analysis systems, or multimodal review pipelines, test a high-context path against your current summarize-and-retrieve baseline. Measure compaction frequency, latency, token cost, and task success on long sessions. The pricing and throughput changes remove two of the biggest production blockers, so the rollout becomes either a real production advantage or just a bigger prompt window depending on your workload.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading