Anthropic Makes Claude's 1M Token Context Generally Available
Anthropic made 1M-token context GA for Claude 4.6, removing long-context premiums and boosting throughput for large code and agent tasks.
Anthropic made 1M-token context generally available for Claude Opus 4.6 and Claude Sonnet 4.6 on March 13, 2026. The practical change is commercial and operational, not just technical: standard pricing now applies across the full 1M window, the old beta header is gone, standard rate limits now apply, and per-request media limits rose from 100 to 600 images or PDF pages. For developers building coding agents, long-running sessions, and large-document workflows, this makes Claude’s 1M context materially easier to use in production.
GA Changes
Anthropic’s announcement turns a beta capability introduced in February into a general-availability feature on the Claude Platform, with rollout also stated for Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry.
The key GA changes are concrete:
| Change | Before GA | After GA, March 13 |
|---|---|---|
| Context window | 1M in beta for 4.6 models | 1M GA for Opus 4.6 and Sonnet 4.6 |
| Pricing above 200K tokens | Long-context premium applied | Standard pricing across full 1M |
| Beta header | Required, context-1m-2025-08-07 | No header required |
| Throughput | Lower long-context throughput model | Standard account throughput |
| Media limit | 100 images or PDF pages | 600 images or PDF pages |
Anthropic also says existing integrations do not need changes if they still send the old beta header, because the platform now ignores it.
The pricing change is the real event
The most important part of the March 13 rollout is the removal of the long-context premium. Anthropic says a 900K-token request is billed at the same per-token rate as a 9K-token request.
Current published pricing for the two 4.6 models is:
| Model | Input price | Output price | Max output |
|---|---|---|---|
| Claude Opus 4.6 | $5 / MTok | $25 / MTok | 128K tokens |
| Claude Sonnet 4.6 | $3 / MTok | $15 / MTok | 64K tokens |
That replaces the earlier beta-era long-context rates above 200K input tokens:
| Model | Previous standard input | Previous long-context input | Previous standard output | Previous long-context output |
|---|---|---|---|---|
| Opus 4.6 | $5 / MTok | $10 / MTok | $25 / MTok | $37.50 / MTok |
| Sonnet 4.6 | $3 / MTok | $6 / MTok | $15 / MTok | $22.50 / MTok |
For applications that regularly cross 200K tokens, that is a large drop in marginal cost. A 900K-token input on Sonnet 4.6 would previously have been billed at the premium long-context rate. Under GA, it stays at the standard $3 / MTok. The same pattern holds for Opus 4.6 at $5 / MTok.
If you build AI agents that carry forward long traces, code history, tool outputs, and planning state, this directly changes your cost envelope.
Impact on Coding Workflows
Anthropic tied the rollout directly to Claude Code. On March 13, it said 1M context is now included in Claude Code for Max, Team, and Enterprise users with Opus 4.6, and that sessions can use the full window automatically.
That matters because coding sessions accumulate context differently from standard chat. A serious agent loop can include repository maps, diffs, test failures, tool outputs, stack traces, prior plans, and long conversational state. Once the window fills, systems typically compact, summarize, or drop older content.
Anthropic says customers saw a 15% decrease in compaction events. That is a vendor-selected example, but the mechanism is straightforward. A larger affordable window means less forced summarization and fewer opportunities to lose details that matter for debugging, code review, or multi-step task execution.
For coding workflows, fewer compactions usually means three operational benefits:
- More stable agent memory across long sessions
- Less summarization overhead, which reduces latency and failure points
- Better preservation of exact text, such as API signatures, error logs, and diff context
If your current coding assistant aggressively compresses session state, this rollout is a reason to re-evaluate how much raw context you keep versus summarize.
The throughput change matters for production systems
Anthropic’s GA post says customers now get their standard account throughput across the entire 1M window. That is easy to miss, but it matters as much as pricing.
A long context window that only works under reduced throughput is harder to schedule in multi-tenant systems. Queue times increase. Bursty agent jobs become harder to manage. Capacity planning gets messy.
Standard throughput at 1M means long-context requests become easier to treat as normal traffic, subject to your account’s usual limits rather than a separate slower tier. If you run background review jobs, incident analysis, or large-batch document processing, this reduces one of the main operational reasons to avoid very large prompts.
Anthropic is also expanding multimodal scope
The media limit increase from 100 to 600 images or PDF pages per request is a substantial change for document-heavy pipelines.
That expansion affects several categories of systems:
- Large legal or compliance review over many PDFs
- Research workflows over image-rich documents
- Multimodal RAG where you pass retrieved pages directly
- Incident and operations analysis that combines screenshots, dashboards, and logs
Anthropic says this higher media cap is available on Claude Platform natively, Azure Foundry, and Vertex AI. In practice, you should still validate the exact cloud-side rollout in your target environment, because documentation around the March 13 announcement showed some lag between the GA post and older beta-era docs.
Long context is only useful if quality holds up
Anthropic’s March 13 post makes a quality claim, not just a capacity claim. The company says Opus 4.6 scores 78.3% on MRCR v2, which it presents as a frontier-leading result at 1M-token context.
Earlier Anthropic materials for the Opus 4.6 launch cited 76% on the 8-needle 1M variant of MRCR v2, compared with 18.5% for Sonnet 4.5. The exact reason for the shift from 76% to 78.3% is not explained in the March 13 post, so the safe reading is narrower: Anthropic continues to present Opus 4.6 as a strong long-context retrieval model at 1M tokens, with published MRCR v2 results in the mid-to-high 70s.
That distinction matters for developers because raw context size does not guarantee usable retrieval performance. A 1M-token window helps only if the model can still find and use the relevant information deep inside that prompt. If you work with context windows, this is the core engineering question, not just how many tokens fit.
Architecture Implications
The March 13 rollout does not eliminate the need for retrieval, chunking, or prompt design. It changes where the cost and complexity breakpoints sit.
For many systems, the new tradeoff looks like this:
| Workflow type | Before March 13 | After March 13 |
|---|---|---|
| Large codebase analysis | More pressure to summarize or retrieve narrowly | More viable to keep larger working set in prompt |
| Long-running coding agents | Compaction and memory pruning required earlier | Longer raw-session continuity becomes affordable |
| Multi-document review | Split into more passes to avoid premium pricing | Larger single-pass analysis becomes more practical |
| Multimodal document workflows | Tighter cap on pages/images per request | Larger direct-ingestion batches become possible |
This affects LLMs used as orchestrators as much as it affects end-user chat. If you maintain agent frameworks, you can revisit policies such as:
- when to compact conversation state
- how much retrieved material to inline versus summarize
- when to persist exact tool output
- whether to batch documents into a single pass or multiple passes
For some workloads, especially code review and investigation-style agents, the answer may shift toward preserving more raw source material.
The 4.6 feature set makes the 1M rollout more relevant
Anthropic’s current 4.6 docs also list extended thinking, adaptive thinking, and support for the rest of the Claude API feature set. The company recommends thinking: { type: "adaptive" } for 4.6 models.
Combined with the 1M GA rollout, this creates a more coherent agent stack: long context, large output budgets, tool use, and sustained reasoning over bigger working sets. Opus 4.6 supports up to 128K output tokens, which is particularly relevant for code transformation, long reports, and structured synthesis. If you rely on structured output, bigger windows plus larger outputs can simplify multi-stage generation pipelines.
There are still constraints. Anthropic’s context-window docs note validation errors instead of silent truncation when prompt plus output exceed the limit, and server-side compaction remains available in beta when conversations approach the cap. You still need token accounting and sane budget controls, especially when multimodal inputs and long outputs combine. This is where tokenization and prompt budgeting remain operational concerns.
Practical Takeaways
The main open question after March 13 is whether developers will treat 1M context as a default or keep it as an exception path for specific workloads.
The benchmark claims support the idea that Opus 4.6 can still retrieve effectively at long range. The pricing and throughput changes remove two of the biggest production blockers. That combination is what makes this release consequential.
If you build coding agents, document analysis systems, or multimodal review pipelines, test a high-context path against your current summarize-and-retrieve baseline. Measure compaction frequency, latency, token cost, and task success on long sessions. That is where Anthropic’s March 13 GA rollout becomes either a real production advantage or just a bigger prompt window.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
What Are Agent Skills and Why They Matter
Agent skills are portable packages of instructions that extend AI coding agents. Here's what they are, how they work, and why the open standard changes how developers work with AI tools.
Claude Adds Inline HTML Visuals and Interactive Charts to Chat
Claude can now generate interactive HTML-based charts and diagrams inline in chat, signaling a new wave of visual reasoning tools.
Agent Skills vs Cursor Rules: When to Use Each
Cursor has both rules and skills for customizing the AI agent. They overlap, but they're not the same. Here's when to use each and how they interact.
How to Create Your First Agent Skill
A step-by-step guide to writing an agent skill from scratch: directory structure, SKILL.md format, effective descriptions, common patterns, and a complete working example.
How to Use AI for Code Review
AI catches patterns, style issues, and common bugs fast. It misses business logic and architecture problems. Here's the practical workflow for using AI code review effectively.