xAI Ships 2-Minute Voice Clones and Grok 4.3 APIs
xAI has introduced a fast custom voice cloning suite and a new Voice Library alongside the launch of its 1M-context Grok 4.3 model.
On April 30, 2026, xAI released Custom Voices alongside Grok 4.3, introducing a voice cloning suite and an expanded 1,000,000 token context window model. The update enables developers to clone human voices from short audio samples in under two minutes and deploy them directly via the Grok Text-to-Speech (TTS) and Voice Agent APIs.
Voice Cloning and Library Features
The new Custom Voices feature generates a production-ready voice clone from a reference audio clip. While shorter clips work, xAI specifies that recordings between 90 and 120 seconds yield optimal quality. Cloned voices are managed in a new Voice Library within the xAI console, which also houses a catalog of over 80 built-in voices supporting 28 languages.
Once processed, the system assigns a unique voice_id. This ID drops directly into existing Grok Voice implementations, acting as a direct swap for default voices. Custom clones inherit the full Grok Voice stack, meaning developers can use Speech Tags like [laugh], [sigh], or <whisper> to manipulate the cloned audio output dynamically. If you build real-time voice agents, the WebSocket integration supports these custom voice IDs natively.
Security Verification and Geographic Limits
To mitigate unauthorized cloning, xAI uses a two-stage verification process during clone generation. The target speaker must read a system-provided verification phrase live. The xAI Speech-to-Text (STT) engine transcribes this live feed to confirm active participation. Second, the system extracts speaker embeddings from both the live phrase and the primary reference audio, comparing them to verify they belong to the same person.
Custom voices are strictly scoped to the generating team’s workspace. They are not pooled into xAI’s public training data or accessible to other organizations. Due to biometric privacy laws, the feature is geographically restricted. It is currently available only in the United States, with a hard block on usage within Illinois.
API Pricing and Grok 4.3
xAI does not charge a premium for custom voice inference. Using a cloned voice_id costs the same as the standard Grok Voice tiers. This infrastructure update ships alongside Grok 4.3, xAI’s new flagship model, which extends its capacity to a 1,000,000 token context window.
| Service | Pricing |
|---|---|
| Grok Text-to-Speech (TTS) | $4.20 per 1 million characters |
| Grok Voice Agent | $3.00 per hour |
| Grok 4.3 (Input) | $1.25 per 1 million tokens |
| Grok 4.3 (Output) | $2.50 per 1 million tokens |
This positions Grok 4.3 aggressively against competitor models from OpenAI and Anthropic, particularly for high-volume agentic tasks that require extensive context retention alongside low-latency audio generation. The Custom Voices feature is initially rolling out to SuperGrok and X Premium+ subscribers.
If you integrate the Grok Voice Agent API, you can swap out standard system voices immediately by passing the new custom voice_id in your WebSocket connection payloads. The latency profile remains unchanged, allowing applications to maintain real-time conversational speeds with domain-specific or branded voices.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Train Multimodal Sentence Transformers for Visual Retrieval
Learn how to finetune multimodal embedding and reranker models for text, image, and audio using the updated Sentence Transformers library.
IBM Granite 4.1 Pushes Dense 8B Model Past Previous 32B MoE
IBM released the Granite 4.1 open-source model family featuring dense text architectures, a 512K context window, and specialized vision and speech variants.
Gemini 1.5 Flash Now Does Real-Time Voice
The new Multimodal Live API enables developers to build low-latency, expressive speech-to-speech applications with advanced emotional inflection.
ChatGPT Images 2.0 Thinks and Searches the Web Before Drawing
OpenAI's latest image model integrates real-time web search and reasoning to generate professional layouts, infographics, and consistent eight-page manga.
Multitask Seamlessly with Chrome’s New Split-Screen AI Mode
Google’s latest Chrome update introduces AI Mode, featuring a split-screen interface and multi-tab bundling to streamline complex research and shopping.