Gemini API Gains Streaming Voice Translation in 70 Languages

Google DeepMind has released Gemini 3.5 Live Translate, a streaming speech-to-speech model built for near real-time voice translation. The native audio-to-audio architecture processes input streams continuously. This design keeps output latency just seconds behind the active speaker, bypassing the wait times of turn-by-turn processing.

The model automatically identifies multilingual inputs across more than 70 supported languages. This autodetection enables over 2,000 language combinations in a single session without manual configuration. Output generation preserves the original speaker’s prosody, mapping the source intonation, pacing, and pitch directly onto the translated audio.

If you build real-time voice agents, the model is engineered with high noise robustness for unpredictable acoustic environments. All generated audio is embedded with SynthID, an imperceptible digital watermark designed to track AI-generated media and mitigate misinformation.

Implementation Surfaces

Google is distributing the model across three primary deployment channels.

Google Translate: Available immediately on Android and iOS. A new “Listening mode” allows users to route translated audio directly through the phone earpiece for a standard phone call experience.

Google Meet: Available in private preview this month for select Google Workspace enterprise customers. The integration expands Meet’s previous translation capabilities from five languages to over 70.

Developer API: The model identifier gemini-3.5-live-translate-preview is available in public preview via Google AI Studio and the Gemini Live API. Early integration partners include Grab, Agora, LiveKit, Pipecat, Fishjam, and Vision Agents.

Pricing and API Constraints

The model operates on a strict audio-only input path. Text input is not supported for this specific streaming translation endpoint.

Tier	Input Cost per Million Tokens	Estimated Cost per Minute
Paid	$3.50	$0.0053
Free	$0.00	$0.00

The release expands the current 3.5 generation lineup, which began with the general availability of Gemini 3.5 Flash in May. Google also launched Gemma 4 12B, a unified encoder-free multimodal model, on the same day. Developers can expect Gemini 3.5 Pro, featuring a 2-million-token context window and a Deep Think reasoning mode, later in June 2026.

When integrating the streaming API, account for the model’s audio-only constraint in your payload structure. Your application must handle the continuous audio stream processing natively, as traditional text-based fallback mechanisms will require routing requests to a separate model in the Gemini family.

Gemini API Gains Streaming Voice Translation in 70 Languages

Implementation Surfaces

Pricing and API Constraints

Keep Reading

Build Real-Time Voice Agents with Cloudflare Agents SDK

Gemini 1.5 Flash Now Does Real-Time Voice

Google Dreambeans Curates Personal Data Into 14 Daily Cartoons

Google Ships 9 Gemini Omni Demos Alongside 3.5 Flash

Project Canvas Renders Vector UI via Gemini 3.0 Ultra