Google AI Edge Eloquent brings free offline dictation to iOS
Google's new AI Edge Eloquent app uses Gemma 4 models to offer high-quality, offline-first transcription and text polishing for free on iPhone.
Google launched Google AI Edge Eloquent for iOS on April 6, delivering an offline-first dictation tool powered entirely by local models. The free application provides real-time transcription and automatic text polishing without requiring a subscription or internet connection. For developers building audio-first applications, the release demonstrates the practical limits of current on-device processing.
Architecture and Local Inference
Eloquent runs on Google’s Gemma 4 open models. Users select between two edge-optimized variants. The Effective 2B (E2B) model prioritizes processing speed and lower resource consumption, while the Effective 4B (E4B) model delivers higher transcription accuracy.
The application stack operates under the Google AI Edge brand, representing Google’s consolidated tooling for on-device machine learning. It relies on XNNPACK libraries and utilizes Armv9 CPU instructions, including SME2, to accelerate neural network execution on modern iPhone silicon. If you deploy models for local AI inference, this implementation shows how to leverage specific hardware instructions to maintain low latency without cloud compute.
Transcription and Text Processing
The primary loop captures audio, transcribes it in real-time, and strips filler words before automatically copying the final output to the iOS clipboard. Users can apply rewrite modes to format the raw text. Options include summarizing into bulleted key points, adjusting for formal tone, or constraining the output length.
A strict offline toggle ensures all audio and text processing remains strictly on the device. In offline mode, the application requires no login and prevents telemetry from reaching Google servers. Users can optionally enable a cloud-based Gemini integration for more complex text transformations. Accuracy improves through a local dictionary feature. Users configure custom jargon manually or sign into Gmail to automatically pull frequently used names and industry terms from sent messages. The application documentation references an upcoming Android build, though it is currently restricted to the iOS App Store.
Pricing and Market Position
The dictation market recently consolidated around subscription-based cloud applications. Wispr Flow and Willow both charge approximately $15 per month for transcription powered by remote APIs. Superwhisper provides local transcription but costs roughly $85 per year and focuses primarily on macOS.
Google released Eloquent entirely for free with no usage caps. The underlying Gemma 4 models utilize an Apache 2.0 license. This provides a blueprint for developers looking to run models locally to compete directly with premium subscription tools.
If you build transcription or voice tools, evaluate the Gemma 4 E2B and E4B variants for your on-device workloads. The Apache 2.0 license allows you to embed these edge models directly into your own mobile applications without passing recurring API costs to your users.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Optimize MoE Inference with Warp Decode
Learn how Cursor's warp decode technique uses GPU kernel optimizations and warp-level primitives to achieve 300+ tokens per second on Blackwell hardware.
Microsoft Releases MAI-Transcribe-1 to Rival Whisper
Microsoft AI unveils MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 to reduce reliance on OpenAI with high-efficiency, in-house foundational models.
Gemma 4 Arrives With Full Apache 2.0 License
Google releases Gemma 4, a new generation of open models optimized for advanced reasoning, agentic workflows, and high-performance edge deployment.
Cohere Transcribe debuts as open-source ASR model
Cohere Transcribe launches as a 2B open-source speech-to-text model with 14-language support, self-hosting, and vLLM serving.
CompactifAI Now Lets You Compress LLMs Through an API
Multiverse rolled out an offline CompactifAI app and a public API portal to bring compressed AI models to edge devices and self-serve users.