Anthropic Makes Claude's 1M Token Context Generally Available
Anthropic made 1M-token context GA for Claude 4.6, removing long-context premiums and boosting throughput for large code and agent tasks.
Anthropic made 1M-token context generally available for Claude Opus 4.6 and Claude Sonnet 4.6 on March 13, 2026. The practical change is commercial and operational: standard pricing now applies across the full 1M window, the old beta header is gone, standard rate limits apply, and per-request media limits rose from 100 to 600 images or PDF pages. For developers building coding agents, long-running sessions, and large-document workflows, this makes Claude’s 1M context materially easier to use in production.
GA Changes
Anthropic’s announcement turns a beta capability introduced in February into a general-availability feature on the Claude Platform, with rollout also stated for Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry. The key change is the removal of the long-context premium. A 900K-token request is billed at the same per-token rate as a 9K-token request. Current published pricing: Claude Opus 4.6 at $5 / MTok input and $25 / MTok output, Claude Sonnet 4.6 at $3 / MTok input and $15 / MTok output. For applications that regularly cross 200K tokens, marginal cost drops sharply. A 900K-token input on Sonnet 4.6 now stays at $3 / MTok instead of the previous premium rate.
If you build AI agents that carry forward long traces, code history, tool outputs, and planning state, this directly changes your cost envelope.
Impact on Coding Workflows
Anthropic tied the rollout to Claude Code. 1M context is now included in Claude Code for Max, Team, and Enterprise users with Opus 4.6, and sessions can use the full window automatically. Coding sessions accumulate context differently from standard chat. A serious agent loop can include repository maps, diffs, test failures, tool outputs, stack traces, prior plans, and long conversational state. Once the window fills, systems typically compact, summarize, or drop older content.
Anthropic says customers saw a 15% decrease in compaction events. A larger affordable window means less forced summarization and fewer opportunities to lose details that matter for debugging, code review, or multi-step task execution. Fewer compactions usually mean more stable agent memory across long sessions, less summarization overhead, and better preservation of exact text such as API signatures, error logs, and diff context.
Throughput and Media Limits
Customers now get standard account throughput across the entire 1M window. A long context window that only works under reduced throughput is harder to schedule in multi-tenant systems. Standard throughput at 1M means long-context requests become easier to treat as normal traffic, subject to your account’s usual limits rather than a separate slower tier.
The media limit increase from 100 to 600 images or PDF pages per request affects large legal or compliance review, research workflows over image-rich documents, multimodal RAG where you pass retrieved pages directly, and incident analysis that combines screenshots, dashboards, and logs.
Quality at Long Range
Anthropic says Opus 4.6 scores 78.3% on MRCR v2 at 1M-token context, which it presents as a frontier-leading result. Raw context size does not guarantee usable retrieval performance. A 1M-token window helps only if the model can still find and use relevant information deep inside that prompt. If you work with context windows, this is the core engineering question.
The March 13 rollout does not eliminate the need for retrieval, chunking, or prompt design. It changes where the cost and complexity breakpoints sit. For many systems, larger working sets in prompt, longer raw-session continuity, and larger single-pass analysis become more practical.
If you build coding agents, document analysis systems, or multimodal review pipelines, test a high-context path against your current summarize-and-retrieve baseline. Measure compaction frequency, latency, token cost, and task success on long sessions. The pricing and throughput changes remove two of the biggest production blockers, so the rollout becomes either a real production advantage or just a bigger prompt window depending on your workload.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
What Are Agent Skills and Why They Matter
Agent skills are portable packages of instructions that extend AI coding agents. Here's what they are, how they work, and why the open standard changes how developers work with AI tools.
Claude Adds Inline HTML Visuals and Interactive Charts to Chat
Claude can now generate interactive HTML-based charts and diagrams inline in chat, signaling a new wave of visual reasoning tools.
How to Use the New Unified Cloudflare CLI and Local Explorer
Learn how to use Cloudflare's new cf CLI and Local Explorer to streamline cross-product development and debug local data for AI agents and human developers.
How to Use Subagents in Claude Code
Learn how to use modular subagents in Claude Code to isolate context, delegate specialized tasks, and optimize costs with custom AI personas.
How to Use Claude Across Excel and PowerPoint with Shared Context and Skills
Learn how to use Claude's shared Excel and PowerPoint context, Skills, and enterprise gateways for faster analyst workflows.