Gemini API Gains Streaming Voice Translation in 70 Languages
Google released Gemini 3.5 Live Translate, a streaming speech-to-speech model supporting over 70 languages with near real-time latency and native API access.
Google DeepMind has released Gemini 3.5 Live Translate, a streaming speech-to-speech model built for near real-time voice translation. The native audio-to-audio architecture processes input streams continuously. This design keeps output latency just seconds behind the active speaker, bypassing the wait times of turn-by-turn processing.
The model automatically identifies multilingual inputs across more than 70 supported languages. This autodetection enables over 2,000 language combinations in a single session without manual configuration. Output generation preserves the original speaker’s prosody, mapping the source intonation, pacing, and pitch directly onto the translated audio.
If you build real-time voice agents, the model is engineered with high noise robustness for unpredictable acoustic environments. All generated audio is embedded with SynthID, an imperceptible digital watermark designed to track AI-generated media and mitigate misinformation.
Implementation Surfaces
Google is distributing the model across three primary deployment channels.
Google Translate: Available immediately on Android and iOS. A new “Listening mode” allows users to route translated audio directly through the phone earpiece for a standard phone call experience.
Google Meet: Available in private preview this month for select Google Workspace enterprise customers. The integration expands Meet’s previous translation capabilities from five languages to over 70.
Developer API: The model identifier gemini-3.5-live-translate-preview is available in public preview via Google AI Studio and the Gemini Live API. Early integration partners include Grab, Agora, LiveKit, Pipecat, Fishjam, and Vision Agents.
Pricing and API Constraints
The model operates on a strict audio-only input path. Text input is not supported for this specific streaming translation endpoint.
| Tier | Input Cost per Million Tokens | Estimated Cost per Minute |
|---|---|---|
| Paid | $3.50 | $0.0053 |
| Free | $0.00 | $0.00 |
The release expands the current 3.5 generation lineup, which began with the general availability of Gemini 3.5 Flash in May. Google also launched Gemma 4 12B, a unified encoder-free multimodal model, on the same day. Developers can expect Gemini 3.5 Pro, featuring a 2-million-token context window and a Deep Think reasoning mode, later in June 2026.
When integrating the streaming API, account for the model’s audio-only constraint in your payload structure. Your application must handle the continuous audio stream processing natively, as traditional text-based fallback mechanisms will require routing requests to a separate model in the Gemini family.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Build Real-Time Voice Agents with Cloudflare Agents SDK
Learn how to integrate low-latency voice interactions into your AI agents using Cloudflare's new @cloudflare/voice package and Durable Objects.
Gemini 1.5 Flash Now Does Real-Time Voice
The new Multimodal Live API enables developers to build low-latency, expressive speech-to-speech applications with advanced emotional inflection.
Google Dreambeans Curates Personal Data Into 14 Daily Cartoons
Google Labs has introduced Dreambeans, an experimental iOS and Android app that uses the Nano Banana 2 model to transform personal data into daily cartoons.
Google Ships 9 Gemini Omni Demos Alongside 3.5 Flash
Google has released nine demonstration videos showcasing Gemini Omni's physics-aware video generation and the benchmark results for Gemini 3.5 Flash.
Project Canvas Renders Vector UI via Gemini 3.0 Ultra
Google introduced Project Canvas at I/O 2026, an AI design platform powered by Gemini 3.0 Ultra that generates editable multi-page marketing materials.