Ai Engineering 3 min read

Claude 4 Opus Hits Microsoft Foundry With 30% Faster Throughput

Anthropic has made Claude 4 Opus, Sonnet, and Haiku generally available on Microsoft Foundry, offering increased throughput via Blackwell clusters.

Anthropic’s flagship models are now running on specialized Microsoft infrastructure. The General Availability of Claude in Microsoft Foundry brings Claude 4 Opus, Sonnet, and Haiku to enterprise tenants requiring strict data boundaries. This deployment breaks Anthropic’s historical exclusivity with AWS and Google Cloud, positioning Microsoft Foundry as a neutral hosting layer for competing tier-one foundation models.

Blackwell Clusters and Networking

The Foundry environment relies on customized Blackwell-based clusters and InfiniBand networking, distinct from the standard Azure OpenAI Service. Microsoft reports this hardware configuration yields up to 30% higher throughput for Claude 4 Opus compared to standard public cloud deployments. For developers building systems that require heavy AI inference, this hardware tuning reduces latency bottlenecks in complex retrieval or reasoning pipelines.

The integration provides a unified API compatible with existing Azure SDKs. Engineering teams do not need to rewrite authentication or network access logic. Dropping Claude into an application built for Azure only requires updating the target model string and adjusting parameters to match Anthropic’s specific context window constraints. It also introduces orchestration tools, allowing developers to handle multi-model routing between Claude 4 and GPT-5 using a single control layer.

Pricing and Reserved Capacity

The release operates on a Foundry Consumption billing model. Claude 4 Opus costs $15.00 per 1 million input tokens and $75.00 per 1 million output tokens. Claude 4 Sonnet is priced at $3.00 for input and $15.00 for output per million tokens.

Organizations with sustained, high-volume workloads can purchase Foundry Slots. These slots provide dedicated compute capacity at a flat monthly rate, ensuring zero-latency spikes during peak traffic hours. This architectural approach is particularly relevant for managing production costs when operating large-scale applications that generate predictable token volumes.

Sovereign Nodes and Compliance

Data privacy guarantees within Foundry strictly isolate tenant data. Microsoft enforces a rigid no-training policy on all Foundry workloads, ensuring proprietary prompts and outputs never leak into future model weights. At launch, the service operates in US East, US West 2, and Sweden Central.

For highly regulated industries in the EU, Microsoft deployed Sovereign Foundry nodes to maintain geographic and operational compliance. Early adopters like Goldman Sachs and Roche are actively using these isolated environments to run Claude’s reasoning tasks securely inside their existing Microsoft perimeters.

If your architecture relies on multiple providers, evaluate the unified orchestration layer to consolidate your API dependencies. Teams experiencing latency degradation with standard public cloud deployments should benchmark the 30% throughput gain on Foundry to determine if the performance increase justifies migrating from existing endpoints.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading