SageMaker Endpoints Now Expose Native OpenAI Completions API
AWS updated Amazon SageMaker AI to expose standard OpenAI API paths, allowing seamless migration of agent workloads without custom SigV4 wrappers.
On May 20, 2026, AWS launched OpenAI-compatible API support for Amazon SageMaker AI real-time inference endpoints. Developers can now point standard OpenAI ecosystem tools directly at AWS infrastructure by simply changing the base URL. This eliminates the need for custom client-side logic, code rewrites, or complex SigV4 signing wrappers previously required to route traffic to AWS hosted models.
Authentication and API Design
SageMaker AI endpoints now natively expose the /openai/v1/chat/completions path. The endpoints support standard Chat Completions requests and handle streaming responses natively using Server-Sent Events (SSE).
To integrate smoothly with the standard OpenAI SDKs in Python and JavaScript, AWS introduced time-limited bearer tokens. You generate these using the sagemaker.core.token_generator.generate_token function in the SageMaker Python SDK. Tokens remain valid for up to 12 hours, with the exact duration configurable down to a single second.
Security remains tied to your existing AWS IAM credentials. The underlying role executing the requests requires the sagemaker:CallWithBearerToken and sagemaker:InvokeEndpoint permissions to authenticate the token generation and inference invocation.
Multi-Model Routing Capabilities
The update extends beyond single-model deployments. The API supports multi-model hosting through SageMaker AI inference components. You can deploy multiple specialized models to a single endpoint and route requests dynamically.
A single OpenAI-compatible base URL can direct general queries to a Llama instance and domain-specific tasks to a fine-tuned Mistral model. The routing depends entirely on the model name specified in the client request payload. This is highly useful when building multi-step AI agents that require different models for reasoning, data extraction, and synthesis.
Supported Frameworks and Containers
The compatibility layer operates out-of-the-box with popular AI agent frameworks like LangChain and Strands Agents. It removes the friction of deploying enterprise-grade AI inference within a secure Virtual Private Cloud (VPC) while keeping standard open-source tooling intact.
AWS officially supports the SageMaker AI vLLM Deep Learning Container and the SGLang Deep Learning Container. You can also use custom containers, provided they implement the /v1/chat/completions and /ping network paths. The feature is available in 14 AWS regions at launch.
If your application relies on standard OpenAI SDKs or gateways like the Vercel AI SDK, you can now migrate those workloads to dedicated AWS GPU instances. Replace your existing API key with a SageMaker generated bearer token and update the base URL to your endpoint to route traffic through AWS.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Use the New Unified Cloudflare CLI and Local Explorer
Learn how to use Cloudflare's new cf CLI and Local Explorer to streamline cross-product development and debug local data for AI agents and human developers.
Runpod Flash Removes Container Overhead for AI Inference
The open-source Flash Python SDK allows developers to convert local functions into auto-scaling serverless AI inference endpoints without Dockerfiles.
Agents Can Provision Cloudflare Accounts via Stripe Projects
Cloudflare has partnered with Stripe to launch a protocol allowing AI agents to autonomously create accounts, manage billing, and register domains.
Claude Platform Goes GA on AWS With Native API Parity
Anthropic has launched the Claude Platform on AWS in general availability, granting developers native API parity directly within their AWS environments.
Pentagon Approves Eight AI Vendors For IL7 Classified Networks
The Department of War has authorized models from OpenAI, Google, and six other vendors for classified networks following its dispute with Anthropic.