News 8 min read

Anthropic Makes Claude's 1M Token Context Generally Available

Anthropic made 1M-token context GA for Claude 4.6, removing long-context premiums and boosting throughput for large code and agent tasks.

Anthropic made 1M-token context generally available for Claude Opus 4.6 and Claude Sonnet 4.6 on March 13, 2026. The practical change is commercial and operational, not just technical: standard pricing now applies across the full 1M window, the old beta header is gone, standard rate limits now apply, and per-request media limits rose from 100 to 600 images or PDF pages. For developers building coding agents, long-running sessions, and large-document workflows, this makes Claude’s 1M context materially easier to use in production.

GA Changes

Anthropic’s announcement turns a beta capability introduced in February into a general-availability feature on the Claude Platform, with rollout also stated for Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry.

The key GA changes are concrete:

ChangeBefore GAAfter GA, March 13
Context window1M in beta for 4.6 models1M GA for Opus 4.6 and Sonnet 4.6
Pricing above 200K tokensLong-context premium appliedStandard pricing across full 1M
Beta headerRequired, context-1m-2025-08-07No header required
ThroughputLower long-context throughput modelStandard account throughput
Media limit100 images or PDF pages600 images or PDF pages

Anthropic also says existing integrations do not need changes if they still send the old beta header, because the platform now ignores it.

The pricing change is the real event

The most important part of the March 13 rollout is the removal of the long-context premium. Anthropic says a 900K-token request is billed at the same per-token rate as a 9K-token request.

Current published pricing for the two 4.6 models is:

ModelInput priceOutput priceMax output
Claude Opus 4.6$5 / MTok$25 / MTok128K tokens
Claude Sonnet 4.6$3 / MTok$15 / MTok64K tokens

That replaces the earlier beta-era long-context rates above 200K input tokens:

ModelPrevious standard inputPrevious long-context inputPrevious standard outputPrevious long-context output
Opus 4.6$5 / MTok$10 / MTok$25 / MTok$37.50 / MTok
Sonnet 4.6$3 / MTok$6 / MTok$15 / MTok$22.50 / MTok

For applications that regularly cross 200K tokens, that is a large drop in marginal cost. A 900K-token input on Sonnet 4.6 would previously have been billed at the premium long-context rate. Under GA, it stays at the standard $3 / MTok. The same pattern holds for Opus 4.6 at $5 / MTok.

If you build AI agents that carry forward long traces, code history, tool outputs, and planning state, this directly changes your cost envelope.

Impact on Coding Workflows

Anthropic tied the rollout directly to Claude Code. On March 13, it said 1M context is now included in Claude Code for Max, Team, and Enterprise users with Opus 4.6, and that sessions can use the full window automatically.

That matters because coding sessions accumulate context differently from standard chat. A serious agent loop can include repository maps, diffs, test failures, tool outputs, stack traces, prior plans, and long conversational state. Once the window fills, systems typically compact, summarize, or drop older content.

Anthropic says customers saw a 15% decrease in compaction events. That is a vendor-selected example, but the mechanism is straightforward. A larger affordable window means less forced summarization and fewer opportunities to lose details that matter for debugging, code review, or multi-step task execution.

For coding workflows, fewer compactions usually means three operational benefits:

  • More stable agent memory across long sessions
  • Less summarization overhead, which reduces latency and failure points
  • Better preservation of exact text, such as API signatures, error logs, and diff context

If your current coding assistant aggressively compresses session state, this rollout is a reason to re-evaluate how much raw context you keep versus summarize.

The throughput change matters for production systems

Anthropic’s GA post says customers now get their standard account throughput across the entire 1M window. That is easy to miss, but it matters as much as pricing.

A long context window that only works under reduced throughput is harder to schedule in multi-tenant systems. Queue times increase. Bursty agent jobs become harder to manage. Capacity planning gets messy.

Standard throughput at 1M means long-context requests become easier to treat as normal traffic, subject to your account’s usual limits rather than a separate slower tier. If you run background review jobs, incident analysis, or large-batch document processing, this reduces one of the main operational reasons to avoid very large prompts.

Anthropic is also expanding multimodal scope

The media limit increase from 100 to 600 images or PDF pages per request is a substantial change for document-heavy pipelines.

That expansion affects several categories of systems:

  • Large legal or compliance review over many PDFs
  • Research workflows over image-rich documents
  • Multimodal RAG where you pass retrieved pages directly
  • Incident and operations analysis that combines screenshots, dashboards, and logs

Anthropic says this higher media cap is available on Claude Platform natively, Azure Foundry, and Vertex AI. In practice, you should still validate the exact cloud-side rollout in your target environment, because documentation around the March 13 announcement showed some lag between the GA post and older beta-era docs.

Long context is only useful if quality holds up

Anthropic’s March 13 post makes a quality claim, not just a capacity claim. The company says Opus 4.6 scores 78.3% on MRCR v2, which it presents as a frontier-leading result at 1M-token context.

Earlier Anthropic materials for the Opus 4.6 launch cited 76% on the 8-needle 1M variant of MRCR v2, compared with 18.5% for Sonnet 4.5. The exact reason for the shift from 76% to 78.3% is not explained in the March 13 post, so the safe reading is narrower: Anthropic continues to present Opus 4.6 as a strong long-context retrieval model at 1M tokens, with published MRCR v2 results in the mid-to-high 70s.

That distinction matters for developers because raw context size does not guarantee usable retrieval performance. A 1M-token window helps only if the model can still find and use the relevant information deep inside that prompt. If you work with context windows, this is the core engineering question, not just how many tokens fit.

Architecture Implications

The March 13 rollout does not eliminate the need for retrieval, chunking, or prompt design. It changes where the cost and complexity breakpoints sit.

For many systems, the new tradeoff looks like this:

Workflow typeBefore March 13After March 13
Large codebase analysisMore pressure to summarize or retrieve narrowlyMore viable to keep larger working set in prompt
Long-running coding agentsCompaction and memory pruning required earlierLonger raw-session continuity becomes affordable
Multi-document reviewSplit into more passes to avoid premium pricingLarger single-pass analysis becomes more practical
Multimodal document workflowsTighter cap on pages/images per requestLarger direct-ingestion batches become possible

This affects LLMs used as orchestrators as much as it affects end-user chat. If you maintain agent frameworks, you can revisit policies such as:

  • when to compact conversation state
  • how much retrieved material to inline versus summarize
  • when to persist exact tool output
  • whether to batch documents into a single pass or multiple passes

For some workloads, especially code review and investigation-style agents, the answer may shift toward preserving more raw source material.

The 4.6 feature set makes the 1M rollout more relevant

Anthropic’s current 4.6 docs also list extended thinking, adaptive thinking, and support for the rest of the Claude API feature set. The company recommends thinking: { type: "adaptive" } for 4.6 models.

Combined with the 1M GA rollout, this creates a more coherent agent stack: long context, large output budgets, tool use, and sustained reasoning over bigger working sets. Opus 4.6 supports up to 128K output tokens, which is particularly relevant for code transformation, long reports, and structured synthesis. If you rely on structured output, bigger windows plus larger outputs can simplify multi-stage generation pipelines.

There are still constraints. Anthropic’s context-window docs note validation errors instead of silent truncation when prompt plus output exceed the limit, and server-side compaction remains available in beta when conversations approach the cap. You still need token accounting and sane budget controls, especially when multimodal inputs and long outputs combine. This is where tokenization and prompt budgeting remain operational concerns.

Practical Takeaways

The main open question after March 13 is whether developers will treat 1M context as a default or keep it as an exception path for specific workloads.

The benchmark claims support the idea that Opus 4.6 can still retrieve effectively at long range. The pricing and throughput changes remove two of the biggest production blockers. That combination is what makes this release consequential.

If you build coding agents, document analysis systems, or multimodal review pipelines, test a high-context path against your current summarize-and-retrieve baseline. Measure compaction frequency, latency, token cost, and task success on long sessions. That is where Anthropic’s March 13 GA rollout becomes either a real production advantage or just a bigger prompt window.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading