Ai Engineering 3 min read

Google Ships 9 Gemini Omni Demos Alongside 3.5 Flash

Google has released nine demonstration videos showcasing Gemini Omni's physics-aware video generation and the benchmark results for Gemini 3.5 Flash.

Google followed its I/O 2026 developer conference by releasing nine demonstration videos of Gemini Omni and Gemini 3.5 Flash in action on May 29. The releases transition the event’s announcements into concrete production targets for developers building long-horizon applications. Gemini 3.5 Flash is now generally available as the default engine for the Gemini app, while Omni introduces a natively multimodal architecture trained to simulate physical environments.

Gemini 3.5 Flash Pricing and Performance

The stable release of gemini-3.5-flash replaces Gemini 3.1 Pro as the baseline for Google Search’s AI Mode. Google engineered the model specifically for long-horizon task execution and coding. It processes output tokens four times faster than frontier equivalents in its tier.

The capability upgrade introduces a steep cost increase. Developers migrating from the previous Gemini 3 Flash Preview face a 3x price hike, fundamentally changing the unit economics for high-volume retrieval applications.

SpecificationValue
Context Window1,048,576 input tokens
Max Output65,536 tokens
Input Price$1.50 per 1M tokens
Output Price$9.00 per 1M tokens
Knowledge CutoffJanuary 2025

Benchmark results reflect the focus on complex execution workflows. The model scored 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas. Multimodal understanding capabilities reached 84.2% on CharXiv. The larger Gemini 3.5 Pro remains in internal testing and will reach developers in June 2026.

Gemini Omni Architecture

Gemini Omni operates as a world model capable of conversational video generation and editing. Omni processes text, audio, images, and video simultaneously as a single unified input. This native multimodality contrasts with pipeline approaches where inputs are translated sequentially before processing.

The model learns and applies physical laws directly to its outputs. The demonstrations highlight Omni simulating fluid dynamics, kinetic energy, and gravity. Users interact with the model to alter specific variables within a generated scene, such as swapping backgrounds or character clothing through natural language prompts.

All outputs carry a SynthID digital watermark to verify machine generation. Gemini Omni Flash is rolling out sequentially to Google AI Plus, Pro, and Ultra subscribers. It will also serve as the backend for content generation in YouTube Shorts and the YouTube Create application.

Autonomous Agent Infrastructure

The demonstrations highlight Google’s shift toward persistent, background execution. Gemini Spark runs autonomously in the cloud, maintaining state and executing multi-step operations independently of local device connectivity. This architecture requires robust multi-agent systems to orchestrate complex dependencies across platforms.

Google also showcased Antigravity 2.0, a standalone development platform built for autonomous execution. During a demonstration, the platform built an operating system and ported the game “Doom” to the new environment in minutes. Decoupling agent environments handles sandboxed execution safely, similar to strategies used to secure AI agents in production environments.

If you rely on Gemini Flash models for high-volume data processing, review your token budgets against the new $1.50/$9.00 pricing structure. The shift in capabilities supports extensive coding and orchestration tasks, but the 3x cost increase requires strict context management to keep production workflows profitable.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading